Sumo Logic and Machine Data Intelligence — DevOps Days Austin

July 2, 2014

Today’s interview from DevOps Days Austin features Sumo Logic’s co-founder and CTO, Christian Beedgen.  If you’re not familiar with Sumo Logic it’s a log management and analytics service.   I caught up with Christian right after he got off stage on day one.

Some of the ground Christian covers:

  • What does Sumo Logic do?
  • How is it different from Splunk and Loggly?
  • What partners and technology make up the Sumo Logic ecosystem?
  • What areas will Sumo Logic focus on in the coming year?

Still to come from DevOps Days Austin:  Rackspace, Dell Cloud Manager, Cote’s Keynote

Extra-credit reading

Pau for now….


App Think Tank – Some take aways

February 3, 2014

The week before last Dell Services held a think tank out in Silicon Valley at the venture firm, NEA.  We had 10 panelists representing both old school and new school organizations:  Intel, Safeway, American Cancer Society, Puppet Labs, NGINX, Stormpath, Stanford Business School, 451 Research and TechCrunch (see complete list of participants below).  I had the honor of moderating the panel.

The group discussed the challenges of the new app-centric world as well as how to leverage both the “Four horsemen of IT du jour”: Cloud, Mobile, Social and Big Data, and the “three enablers”: Open Source, DevOps and APIs.

You can see more pictures from the event as well as watch the entire think tank, which ran a bit under three and half hours, here.  Additionally, over the next few days I will be posting blogs around four short video snippets from the event.

  • Video 1:  What do customers expect
  • Video 2:  IT is facing competition for the first time ever
  • Video 3:  The persistently, ubiquitously connected to the network era
  • Video 4:  The web of C level relationships

Some Take-aways

I was really impressed how well the participants gelled as a group, with just the right amount of tension :).  Below are a few of the interesting tidbits I took away (I was surprised how much of the conversation came back to culture)  You can also check out SDNCentral’s summary of the event .

Q: What are the customer expectations of services today?

  • They are personalized and immediate (friction is a killer)
  • They are agile and rapidly improve
  • Available from any device, anywhere and are always on

Q: What big bets are you making?

  • “Open Source all the way” — Barry Libenson, CIO,  Safeway
  • Mobile first, platform agnostic – Jay Ferro – CIO American Cancer Society
  • Hire learners, not vertical experts, we want entrepreneurial problem solvers – Ranga Jayaraman – CIO,  Stanford Business School
  • Everything must be services – Das Kamhout – IT Principal Engineer, Intel
  • Set up a learning culture, that is tolerant of failure – Luke Kanies, Puppet CEO
  • Clean APIs and modularity – Alex Salazar –  CEO, Stormpath

Q: If your son or daughter wanted to be a CIO, what advice would you give them?

  • Be really choosey about the company you work for
  • Learn to entertain opposing ideas and paths
  • Agility, flexibility, adaptability
  • Learn to let go
  • Learn to be a hacker
  • Learn mindfulness

Watch the whole event here.

Participants

  • Barry Libenson-SVP and CIO,  Safeway
  • Jay Ferro – CIO, American Cancer Society
  • Ranga Jayaraman- Associate Dean & CIO,  Stanford GSB
  • Luke Kanies – Founder & CEO, Puppet Labs
  • Alex Salazar – Co-Founder & CEO, Stormpath
  • Alex Williams – Blogger & Journalist, TechCrunch
  • Michael Cote – Research Director, Infrastructure Software at 451 Research
  • Sarah Novotny- Tech Evangelist, NGINX
  • Das Kamhout – IT Principal Engineer, Intel
  • Jimmy Pike – Sr. Fellow and Chief Architect, Dell

Extra-credit reading

  • IT can’t thrive unless CIO’s can change the culture – SDNcentral
  • New Age of Apps” Think Tank to be streamed live –  Barton’s blog

Pau for now…


Consumerization: setting the bar for IT

January 28, 2014

Mark Stouse of BMC has asked various people in the industry to answer seven short questions for his series Marking Predictions for 14.  The questions are around Cloud Computing, Big Data and Consumerization.

To give you a taste of what I was thinking about, here is my response to the second question and why I think Consumerization is a big deal:

Cloud Computing, Big Data or Consumerization: which trend do you feel is having the most impact on IT today and why?

Consumerization, because it sets the bar for how technology should look and be designed.  Workers want technology in the workplace that is as easy to use and intuitive as the consumer applications and tech products they use at home.  Consumerization has set a high bar for IT but one that I believe will ultimately benefit all involved through greater adoption, satisfaction and productivity.

You can see my complete responses on Mark’s blog and learn, among other things, why I think Tony Stark is like big data.

Pau for now…


Whitepaper: Learning from Web Companies to drive Innovation

December 4, 2013

Web-WhitepapercoverToday I finally get to debut a white paper that Michael Cote, now of the 451 Research, and I started quite a while back:

Learning from Web companies to drive Innovation – Embracing DevOps, Scale and Open Source Software

The basic theme of the paper is that Web companies set the agenda for the IT industry and enterprises can benefit by understanding and following their practices

The paper’s key themes:

  • Web companies are characterized by Open Source software and a three-tiered architecture:
    • A scale out infrastructure
    • A data tier that utilizes big data
    • An application tier supported by a proliferation of development languages
  • Developers are kingmakers and must be supported and allowed to innovate
  • DevOps is a key trend that brings developers and operations together to reduce friction and increase velocity

If this looks at all interesting, please check it out.  It should be a quick read and hopefully we’ve written it in away that is accessible to a wide audience.

Extra-credit viewing

Pau for now…


Dell and Sputnik go to OSCON

July 18, 2013

Next week, myself, Michael Cote and a whole other bunch of Dell folk will be heading out to Portland for the 15th annual OSCON-ana-polooza.  We will have two talks that you might want to check out:

Cote and I will be giving the first and the second will be lead by Joseph George and James Urquhart.

Sputnik Shirt

And speaking of Project Sputnik, we will be giving away three of our XPS 13 developer editions:  one as a door prize at the OpenStack birthday party, one as a drawing at our booth and one to be given away at James and Joseph’s talk listed above.

We will also have a limited amount of the shirt to the right so stop by the booth.

But wait, there’s more….

To learn firsthand about Dell’s open source solutions be sure to swing by booth #719 where we will have experts on hand to talk to you about our wide array of solutions:

  • OpenStack cloud solutions
  • Hadoop big data solutions
  • Crowbar
  • Project Sputnik (the client to cloud developer platform)
  • Dell Multi-Cloud Manager (the platform formerly known as “Enstratius”)
  • Hyperscale computing systems

Hope to see you there.

Pau for now…


Time Lapse: Building Dell’s Big Data/OpenStack MDC — allowing customers to test at hyper scale

April 1, 2013

Back in September I posted an entry about the Modular Data Center that we set up in the Dell parking lot.  Here is a time lapse video showing the MDC and the location being built out.

The MDC allows customers to test solutions at scale.  It is running OpenStack and various Big Data goodies such as Hadoop, Hbase, Cassandra, MongoDB, Gluster etc…

Customers can tap into the MDC from Dell’s solution centers around the world and do proof of concepts as well competitive bake-offs between various big data technologies so they can determine which might best suit their environment and use case.

Extra-credit reading


Dell’s Big Data escalator pitch

February 24, 2012

At our sales kickoff in Vegas, Rob Hirschfeld chose a unique vehicle to succinctly convey our Big Data story here at Dell.  Check out the video below to hear one of our chief software architects for our Big Data and OpenStack solutions explain, in less than 90 seconds, what we are up to in the space and the value it brings customers.

Extra credit reading

Pau for now…


Hadoop World, a belated summary

February 13, 2012

With O’Reilly’s big data conference Strata coming up in just a couple of weeks, I thought I might as well get around to finally writing up my notes from Hadoop World .  The event, which was put on by Cloudera, was held last November 8-9 in New York city.   There were over 1,400 attendees from 580 companies and 27 countries with two thirds of the audience being technical.

Growing beyond geek fest

The event itself has picked up significant momentum over the last three years going from 500 attendees, to 900 the second year, to over 1400 this past year.  The tone has gone from geek-fest to an event focused also on business problems e.g. one of the keynotes was by Larry Feinsmith, managing director of the office of the CIO at JP Morgan Chase.  Besides Dell, other large companies like HP, Oracle and Cisco also participated.

As a platinum sponsor, Dell  had both a booth and a technical presentation.   At the event we announced that we would be open sourcing the Crowbar barclamp for Hadoop and at out booth we showed off the Dell | Hadoop Big Data Solution which is based on Cloudera Enterprise.

Cutting’s observations

Doug Cutting, the father of  Hadoop, Cloudera employee and chairman of the Apache software foundation, gave a much anticipated keynote.  Here are some of the key things I caught:

  • Still young: While Cutting felt that Hadoop had made tremendous progress he saw it as still young with lots of missing parts and niches to be filled.
  • Big Top: He talked about the Apache “Bigtop” project which is an open source program to pull together the various pieces of the Hadoop ecosystem.  He explained that Bigtop is intended to serve as the basis for the Cloudera Distribution of Hadoop (CDH), much the same way Fedora is the basis  for RHEL (Redhat Enterprise Linux).
  • “Hadoop” as “Linux“: Cutting also talked about how Hadoop has become the kernel of the distributed OS for big data.  He explained that, much the same way that “Linux” is technically only the kernel of the GNU Linux operating system, people are using the word Hadoop to mean the entire Hadoop ecosystem including utilities.

Interviews from the event

To get more of the flavor of the event here is a series of interviews I conducted at the show, plus one where I got the camera turned on me:

Extra-credit reading

Blogs regarding Dell’s crowbar announcement

Hadoop Glossary

  • Hadoop ecosystem
    • Hadoop: An open source platform, developed at Yahoo that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.  It is particularly suited to large volumes of unstructured data such as Facebook comments and Twitter tweets, email and instant messages, and security and application logs.
    • MapReduce: a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in parallel on large clusters of commodity hardware in a reliable, fault-tolerant manner.  Hadoop acts as a platform for executing MapReduce.  MapReduce came out of Google
    • HDFS: Hadoop’s Distributed File system allows large application workloads to be broken into smaller data blocks that are replicated and distributed across a cluster of commodity hardware for faster processing.
  • Major Hadoop utilities:
    • HBase: The Hadoop database that supports structured data storage for large tables.   It provides real time read/write access to your big data.
    • Hive:  A data warehousing solution built on top of Hadoop.  An Apache project
    • Pig: A platform for analyzing large data that leverages parallel computation.  An Apache project
    • ZooKeeper:  Allows Hadoop administrators to track and coordinate distributed applications.  An Apache project
    • Oozie: a workflow engine for Hadoop
    • Flume: a service designed to collect data and put it into your  Hadoop environment
    • Whirr: a set of libraries for running cloud services.  It’s ideal for running temporary Hadoop clusters to carry out a proof of concept, or to run a few one-time jobs.
    • Sqoop: a tool designed to transfer data between Hadoop and relational databases.  An Apache project
    • Hue: a browser-based desktop interface for interacting with Hadoop

Web Glossary part two: Data tier

January 18, 2012

Here is part two of three of the Web glossary I complied.  As I mentioned in my last two entries, in compiling this I pulled information from various and sundry sources across the Web including wikipedia, community and company web sites and the brain of Cote.

Enjoy

General terms

  • Structured data: Data that can be organized in a structure e.g. rows or columns so that it is identifiable. The most universal form of structured data is a database like SQL or Access.
  • Unstructured data:  Data that has no identifiable structure. Unstructured data typically includes bitmap images/objects, text and other data types that are not part of a database. Most enterprise data today can actually be considered unstructured. An email is considered unstructured data.
  • Big Data: Data characterized by one or more of the following characteristics:  Volume – A large amount of data, growing at large rates; Velocity – The speed at which the data must be processed and a decision made;  Variety – The range of data, types and structure to the data
  • Relational Databases (RDBMS) Management Systems: These databases are the incumbents in enterprises today and store data in rows and columns.  They are created using a special computer language, structured query language (SQL), that is the standard for database interoperability.  Examples:  IBM DB2, MySQL, Microsoft SQL Server, PostgreSQL, Oracle RDBMS, Informix, Oracle Rdb, etc.
  • NoSQL: refers to a class of databases that 1) are intended to perform at internet (Facebook, Twitter, LinkedIn) scale and 2) reject the relational model in favor of other (key-value, document, graph) models.  They often achieve performance by having far fewer features than SQL databases and focus on a subset of use cases.  Examples: Cassandra, Hadoop, MongoDB, Riak
  • Recommendation engine:  A recommendation engine takes a collection of frequent itemsets as input and generates a recommendation set for a user by matching the current user’s activity against the discovered patterns. The recommendation engine is on-line process, therefore its efficiency and scalability are key,  e.g. people who bought X often also bought Y.
  • Geo-spatial targeting: the practice of mapping advertising, offers and information based on geo location.
  • Behavioral targeting: a technique used by online publishers and advertisers to increase the effectiveness of their campaigns.  Behavioral targeting uses information collected on an individual’s web-browsing behavior, such as the pages they have visited or the searches they have made, to select which advertisements to display to that individual.
  • Clickstream analysis: On a Web site, clickstream analysis is the process of collecting, analyzing, and reporting aggregate data about which pages visitors visit in what order – which are the result of the succession of mouse clicks each visitor makes (that is, the clickstream). There are two levels of clickstream analysis, traffic analysis and e-commerce analysis.

Projects/Entities

  • Gluster: a software company acquired by Red Hat that provides an open source platform for scale-out Public and Private Cloud Storage.
  • Relational Databases
    • MySQL:  the most popular open source RDBMS.  It represents the “M” in the LAMP stack.  It is now owned by Oracle.
    • Drizzle:  A version of MySQL that is specifically targeted the cloud.  It is currently an open source project without a commercial entity behind it.
    • Percona:  A MySQL support and consulting company that also supports Drizzle.
    • PostgreSQL: aka Postgres is is an object-relational database management system (ORDBMS) available for many platforms including Linux, FreeBSD, Solaris, Windows and Mac OS X.
    • Oracle DB – not used so much in new WebTech companies, but still a major database in the development world.
    • SQL Server – Microsoft’ s RDBMS

    NoSQL Databases

    • MongoDB:  an open source, high-performance, database written in C++.  Many Linux distros include a MongoDB package, including CentOS, Fedora, Debian, Ubuntu and Gentoo.  Prominent users include Disney interactive media group, New York Times, foursquare, bit.ly, Etsy. 10gen is the commercial backer of MongoDB.
    • Riak: a NoSQL database/datastore written in Erlang from the company Basho. Originally used for the Content Delivery Network Akamai.
    • Couchbase: formed from the merger of CouchOne and Membase.  It offers Couchbase server powered by Apache CouchDB and is available in both Enterprise and Community editions. The author of CouchDB was a prominent Lotus Notes architect.
    • Cassandra: A scalable NoSQL database with no single points of failure.   A high-scale, key/value database originating from Facebook to handle their message inboxes. Backed by DataStax, which came out of Rackspace.
    • Mahout: A Scalable machine learning and data mining library. An analytics engine for doing machine learning (e.g., recommendation engines and scenarios where you want to infer relationships).
  • Hadoop ecosystem
    • Hadoop: An open source platform, developed at Yahoo that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.  It is particularly suited to large volumes of unstructured data such as Facebook comments and Twitter tweets, email and instant messages, and security and application logs.
    • MapReduce: a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in parallel on large clusters of commodity hardware in a reliable, fault-tolerant manner.  Hadoop acts as a platform for executing MapReduce.  MapReduce came out of Google
    • HDFS: Hadoop’s Distributed File system allows large application workloads to be broken into smaller data blocks that are replicated and distributed across a cluster of commodity hardware for faster processing.
  • Major Hadoop utilities:
    • HBase: The Hadoop database that supports structured data storage for large tables.   It provides real time read/write access to your big data.
    • Hive:  A data warehousing solution built on top of Hadoop.  An Apache project
    • Pig: A platform for analyzing large data that leverages parallel computation.  An Apache project
    • ZooKeeper:  Allows Hadoop administrators to track and coordinate distributed applications.  An Apache project
    • Oozie: a workflow engine for Hadoop
    • Flume: a service designed to collect data and put it into your  Hadoop environment
    • Whirr: a set of libraries for running cloud services.  It’s ideal for running temporary Hadoop clusters to carry out a proof of concept, or to run a few one-time jobs.
    • Sqoop: a tool designed to transfer data between Hadoop and relational databases.  An Apache project
    • Hue: a browser-based desktop interface for interacting with Hadoop
  • Cloudera: a company that provides a Hadoop distribution similar to the way Red Hat provides a Linux distribution.  Dell is using Cloudera’s distribution of Hadoop for its Hadoop solution.
  • Solr: an open source enterprise search platform from the Apache Lucene project. Backed by the commercial company Lucid Imagination.
  • Elastic Search: an open source, distributed, search engine built on top of Lucene (raw search middleware).

Extra-credit reading

Pau for now…


Hadoop World: Talking to Splunk’s Co-founder

December 4, 2011

Last but not least in the 10 interviews I conducted while at Hadoop World is my talk with Splunk‘s CTO and co-founder Erik Swan.  If you’re not familiar with Splunk think of it as a search engine for machine data, allowing you to monitor and analyze what goes on in your systems.  To learn more, listen to what Erik has to say:

Some of the ground Erik covers:

  • What is Splunk and what do they do?
  • (1:43)  The announcement they made at Hadoop world about integrating with Hadoop and what that means.
  • (4:25) How Erik and Rob Das got the the idea to get involved in the wacky world of machine data and to create Splunk.

Extra-credit reading

Pau for now…


Hadoop World: NoSQL database MongoDB

November 28, 2011

I’m getting near the end of the interviews that I did while at Hadoop World earlier this month, just one more after this (with Splunk’s CTO and co-founder).

Today’s entry features a talk I had with Nosh Petigara, director of product strategy at 10gen, the company behind MongoDB.

Some of the ground that Nosh covers

  • Who is 10gen and what is MongoDB
  • (0:29) How does Nosh define NoSQL
  • (1:20) What use cases is Mongo best at
  • (2:14) Some examples of customers using Mongo (foursquare, Disney and MTV) and what they’re using it for
  • (3:08) How Mongo and Hadoop work together
  • (4:03) Whats in Mongo’s future that Nosh is excited about

Extra-credit reading

  • Mongo Conference: MongoSV (Dec 9 in Silicon valley)

Pau for now…


Hadoop World: Battery Ventures EIR Todd P.

November 16, 2011

Todd Papaioannou has been in Big Data for a while.  He built the original engineering team at Greenplum, worked at Teradata for 5 years and mostly recently, before joining Battery Ventures as an  Entrepreneur in Residence, served as Yahoo’s Chief Cloud architect.

I grabbed some time with Mr. P to learn what it means to be an EIR and what he’s seeing in the industry from his vantage point.

Some of the ground Todd covers

  • Todd’s background
  • (0:45) What is an Entrepreneur in Residence and how did Todd become one
  • (2:45) What trends is he seeing in the space and how does he feel the market’s evolving
  • (4:00) What are his big take aways from this year’s Hadoop World

But wait, there’s more!

Stay tuned for more interviews from last week’s Hadoop world.  On tap are:

  • John Gray of Facebook
  • Erik Swan of Splunk
  • Nosh Petigara of 10gen/MongoDB

Extra-credit reading

Pau for now…


Hadoop World: Learning about NoSQL database Couchbase

November 10, 2011

The next in my series of video interviews from Hadoop World is with Mark Azad who covers technical solutions for Couchbase.  If you’re not familiar with Couchbase it’s a NoSQL database provider and the company was formed when, earlier this year, CouchOne and Membase merged.

Here’s what Mark had to say.

Some of the ground Mark covers

  • What is Couchbase and what is NoSQL
  • How Couchbase works with Hadoop
  • What its product line up looks like and his new combined offering coming next year
  • Some of Couchbase’s customers and how Zynga uses them
  • What excites Mark the most up the upcoming year in Big Data

Extra-credit reading

Pau for now…


Hadoop World: O’Reilly Strata conference chair, Ed Dumbill

November 10, 2011

Yesterday, Hadoop World 2011 wrapped here in New York.  During the event I was able to catch up with a bunch of folks representing a wide variety of members of the ecosystem.  On the first day I caught up with Ed Dumbill of O’Reilly Media who writes about big data for O’Reilly Radar and also is the GM for O’Reilly’s big data conference, Strata.

Here’s what Ed had to say.

Some of the ground Ed covers

  • What is Strata and what does it cover
  • How will this years conference differ from last
  • Which customer types are making the best use of Hadoop, will Strata verticalize going forward
  • What is Ed looking forward to most in the upcoming Strata.

Extra-credit reading

Pau for now…


Hadoop World: Cloudera CEO reflects on this year’s event

November 9, 2011

A few hours ago the third annual Hadoop World conference wrapped up here in New York city.  It has been a packed couple of days with keynotes, sessions and exhibits from all sorts of folks within the greater big data ecosystem.

I caught up with master of ceremonies and Cloudera CEO Mike Olson to get his thoughts on the event and predictions for next year.

Some of the ground Mike covers:

  • How this year’s event compares to the first two and how its grown (it ain’t Mikey Rooney anymore)
  • (2:06) Key trends and customers at the event
  • (4:02) Mike’s thoughts on the Dell/Cloudera partnership
  • (5:35) Looking forward to Hadoop world 2012 and where to go next

Stay tuned

If you’re interested in seeing more interviews from Hadoop World 2011 be sure to check back.  I have eight other vlogs that I will be posting in the upcoming days with folks from Mongo DB, O’Reilly Media, Facebook, Couchbase, Karmasphere, Splunk, Ubuntu and Battery Ventures.

Extra-credit reading:

Pau for now…


Hadoop World: Accel’s $100M Big Data fund

November 9, 2011

Yesterday Hadoop World kicked off here in New York city.  As part of the opening keynotes, Ping Li of Accel Partners got on stage and announced that they are opening a $100 million dollar fund focusing on big data.  If you’re not familiar with Accel, they are the venture capital firm that have invested in such hot companies as Facebook, Cloudera, Couchbase, Groupon and Fusion IO.

I grabbed some time with Ping at the end of the sessions yesterday to learn more about their fund:

Some of the ground Ping covers:

  • What areas within the data world the fund will focus on.
  • Who are some of the current players within their portfolio that fall into the big data space.
  • What trends Ping’s seeing within the field of Big Data.
  • How to engage with Accel and why it would make sense to work with them.

Extra-credit reading:

Pau for now…


Big Data is the new Cloud

October 12, 2011

Big Data represents the next not-completely-understood got-to-have strategy.  This first dawned on me about a year ago and has continued to become clearer as the phenomenon has gained momentum.  Contributing to Big Data-mania is Hadoop, today’s weapon of choice in the taming and harnessing of  mountains of unstructured data, a project that has its own immense gravitational pull of celebrity.

So what

But what is the value of slogging through these mountains of data?  In a recent Forrester blog, Brian Hopkins lays it out very simply:

We estimate that firms effectively utilize less than 5% of available data. Why so little? The rest is simply too expensive to deal with. Big data is new because it lets firms affordably dip into that other 95%. If two companies use data with the same effectiveness but one can handle 15% of available data and one is stuck at 5%, who do you think will win?

The only problem is that while unstructured data (email, clickstream data, photos, web logs, etc.) makes up the vast majority of today’s data, the majority of the incumbent data solutions aren’t designed to handle it.    So what do you do?

Deal with it

Hadoop, which I mentioned above, is your first line of offense when attacking big data.  Hadoop is an open source highly scalable compute and storage platform.  It can be used to collect, tidy up and store boatloads of structure and unstructured data.  In the case of enterprises it can be combined with a data warehouse and then linked to analytics (in the case web companies they forgo the warehouse).

And speaking of web companies Hopkins explains

Google, Yahoo, and Facebook used big data to deal with web scale search, content relevance, and social connections, and we see what happened to those markets. If you are not thinking about how to leverage big data to get the value from the other 95%, your competition is.

So will Big Data truly displace Cloud as the current must-have buzz-tastic phenomenon in IT?  I’m thinking in many circles it will.  While less of a tectonic shift, Big Data’s more “modest” goals and concrete application make it easier to draw a direct line between effort and business return.  This in turn will drive greater interest, tire kicking and then implementation.  But I wouldn’t kick the tires for too long for as the web players have learned, Big Data is a mountain of straw just waiting to be spun into gold.

Extra-credit reading:

Pau for now…


Does Hadoop compete with or complement the data warehouse?

August 12, 2011

Dell’s chief architect for big data, Aurelian Dumitru (aka. A.D.) presented a talk at OSCON the week before last with the heady title, “Hadoop – Enterprise Data Warehouse Data Flow Analysis and Optimization.”  The session, which was well attended, explored the integration between Hadoop and the Enterprise Data Warehouse.  AD posted a fairly detailed overview of his session on his blog but if you want a great high level summary, check this out:

Some of the ground AD covers

  • Mapping out the data life cycle: Generate -> Capture -> Store -> Analyze ->Present
  • Where does Hadoop play and where does the data warehouse?  Where do they overlap?
  • Where do BI tools fit into the equation?
  • To learn more, check out dell.com/hadoop

Extra-credit reading


Introducing the Dell | Cloudera solution for Apache Hadoop — Harnessing the power of big data

August 4, 2011

Data continues to grow at an exponential rate and no place is this more obvious than in the Web space.  Not only is the amount exploding but so is the form data’s taking whether that’s transactional, documents, IT/OT, images, audio, text, video etc.   Additionally much of this new data is unstructured/ semi-structured which traditional relational databases were not built to deal with.

Enter Hadoop, an Apache open source project which, when combined with Map Reduce allows the analysis of entire data sets, rather than sample sizes, of structured and unstructured data types.  Hadoop lets you chomp thru mountains of data faster and get to insights that drive business advantage quicker.   It can provide near “real-time” data analytics for click-stream data, location data, logs, rich data, marketing analytics, image processing, social media association, text processing etc.  More specifically, Hadoop is particularly suited for applications such as:

  • Search Quality — search attempts vs. structured data analysis; pattern recognition
  • Recommendation engine — batch processing; filtering and prediction (ie use information to predict what similar users like)
  • Ad-targeting – batch processing; linear scalability
  • Thread analysis for spam fighting and detecting click fraud —  batch processing of huge datasets; pattern recognition
  • Data “sandbox” – “dump” all data in Hadoop; batch processing (ie analysis, filtering, aggregations etc); pattern recognition

The Dell | Cloudera solution

Although Hadoop is a very powerful tool, it can be a bit daunting to implement and use.  This fact wasn’t lost on the founders of Cloudera who set up the company to make Hadoop easier to used by packaging it and offering support.   Dell has joined with this Hadoop pioneer to provide the industry’s first complete Hadoop Solution (aptly named “the Dell | Cloudera solution for Apache Hadoop”).

The solution is comprised of Cloudera’s distribution of Hadoop, running on optimized Dell PowerEdge C2100 servers with Dell PowerConnect 6248 switch, delivered with joint service and support. Dell offers two flavors of this big data solution: Cloudera’s distribution with the free download of Hadoop software, and Cloudera’s enterprise version of Hadoop that comes with a charge.

It comes with its own “crowbar” and DIY option

The Dell | Cloudera solution for Apache Hadoop also comes with Crowbar, the recently open-sourced Dell-developed software, which provides the necessary tools and automation to manage the complete lifecycle of Hadoop environments.  Crowbar manages the Hadoop deployment from the initial server boot to the configuration of the main Hadoop components allowing users to complete bare metal deployment of multi-node Hadoop environments in a matter of hours, as opposed to days. Once the initial deployment is complete, Crowbar can be used to maintain, expand, and architect a complete data analytics solution, including BIOS configuration, network discovery, status monitoring, performance data gathering, and alerting.

The solution also comes with a reference architecture and deployment guide, so you can assemble it yourself, or Dell can build and deploy the solution for you, including rack and stack, delivery and implementation.

Some of the coverage (added Aug 12)

Extra-credit reading

 

Pau for now…


Hadoop Summit: Talking to the CEO of MapR

July 10, 2011

I’m now back from vacation and am continuing with my series of videos from the Hadoop Summit.  The one-day summit, which was very well attended, was held in Santa Clara the last week of June.  One of the two Platinum sponsors was MapR technologies.  MapR are particulaly interesting since they have taken a different approach to productizing Hadoop than the current leader Cloudera.

I got some time with their CEO and co-founder John Schroeder to learn more about MapR:

Some of the ground John covers

  • The announcements they made at the event
  • (0:16) How John got the idea to start MapR: what tech trends he was seeing and what customer problems was he learning about.
  • (1:43) How MapR’s approach to Hadoop differs from Cloudera (and Hortonworks)
  • (3:49) How the Hadoop community is growing, both with regards to Apache and the commercial entities that are developing, and the importance of this growth.

Extra-credit reading

Pau for now…


Follow

Get every new post delivered to your Inbox.

Join 88 other followers

%d bloggers like this: