July 18, 2013
Next week, myself, Michael Cote and a whole other bunch of Dell folk will be heading out to Portland for the 15th annual OSCON-ana-polooza. We will have two talks that you might want to check out:
Cote and I will be giving the first and the second will be lead by Joseph George and James Urquhart.
And speaking of Project Sputnik, we will be giving away three of our XPS 13 developer editions: one as a door prize at the OpenStack birthday party, one as a drawing at our booth and one to be given away at James and Joseph’s talk listed above.
We will also have a limited amount of the shirt to the right so stop by the booth.
But wait, there’s more….
To learn firsthand about Dell’s open source solutions be sure to swing by booth #719 where we will have experts on hand to talk to you about our wide array of solutions:
- OpenStack cloud solutions
- Hadoop big data solutions
- Project Sputnik (the client to cloud developer platform)
- Dell Multi-Cloud Manager (the platform formerly known as “Enstratius”)
- Hyperscale computing systems
Hope to see you there.
Pau for now…
April 1, 2013
Back in September I posted an entry about the Modular Data Center that we set up in the Dell parking lot. Here is a time lapse video showing the MDC and the location being built out.
The MDC allows customers to test solutions at scale. It is running OpenStack and various Big Data goodies such as Hadoop, Hbase, Cassandra, MongoDB, Gluster etc…
Customers can tap into the MDC from Dell’s solution centers around the world and do proof of concepts as well competitive bake-offs between various big data technologies so they can determine which might best suit their environment and use case.
September 11, 2012
Why use valuable internal real estate when you can just set up a Modular Data Center (MDC) in your parking lot? The point wasn’t lost on the Dell Solution Center team who, with help from our partners Intel, is doing just that here in Round Rock.
The new MDC, which should be online in a few weeks, will host Dell’s OpenStack-Powered Cloud and Apache Hadoop solutions for customers to test drive and build POCs in Dell Solution Centers around the world.
Here’s the MDC being lowered into place yesterday.
Here are some pics I snapped this morning when I went down to get my coffee. (double click on them to see them full sized)
Pau for now…
February 24, 2012
At our sales kickoff in Vegas, Rob Hirschfeld chose a unique vehicle to succinctly convey our Big Data story here at Dell. Check out the video below to hear one of our chief software architects for our Big Data and OpenStack solutions explain, in less than 90 seconds, what we are up to in the space and the value it brings customers.
Extra credit reading
Pau for now…
January 18, 2012
Here is part two of three of the Web glossary I complied. As I mentioned in my last two entries, in compiling this I pulled information from various and sundry sources across the Web including wikipedia, community and company web sites and the brain of Cote.
- Structured data: Data that can be organized in a structure e.g. rows or columns so that it is identifiable. The most universal form of structured data is a database like SQL or Access.
- Unstructured data: Data that has no identifiable structure. Unstructured data typically includes bitmap images/objects, text and other data types that are not part of a database. Most enterprise data today can actually be considered unstructured. An email is considered unstructured data.
- Big Data: Data characterized by one or more of the following characteristics: Volume – A large amount of data, growing at large rates; Velocity – The speed at which the data must be processed and a decision made; Variety – The range of data, types and structure to the data
- Relational Databases (RDBMS) Management Systems: These databases are the incumbents in enterprises today and store data in rows and columns. They are created using a special computer language, structured query language (SQL), that is the standard for database interoperability. Examples: IBM DB2, MySQL, Microsoft SQL Server, PostgreSQL, Oracle RDBMS, Informix, Oracle Rdb, etc.
- NoSQL: refers to a class of databases that 1) are intended to perform at internet (Facebook, Twitter, LinkedIn) scale and 2) reject the relational model in favor of other (key-value, document, graph) models. They often achieve performance by having far fewer features than SQL databases and focus on a subset of use cases. Examples: Cassandra, Hadoop, MongoDB, Riak
- Recommendation engine: A recommendation engine takes a collection of frequent itemsets as input and generates a recommendation set for a user by matching the current user’s activity against the discovered patterns. The recommendation engine is on-line process, therefore its efficiency and scalability are key, e.g. people who bought X often also bought Y.
- Geo-spatial targeting: the practice of mapping advertising, offers and information based on geo location.
- Behavioral targeting: a technique used by online publishers and advertisers to increase the effectiveness of their campaigns. Behavioral targeting uses information collected on an individual’s web-browsing behavior, such as the pages they have visited or the searches they have made, to select which advertisements to display to that individual.
- Clickstream analysis: On a Web site, clickstream analysis is the process of collecting, analyzing, and reporting aggregate data about which pages visitors visit in what order – which are the result of the succession of mouse clicks each visitor makes (that is, the clickstream). There are two levels of clickstream analysis, traffic analysis and e-commerce analysis.
- Gluster: a software company acquired by Red Hat that provides an open source platform for scale-out Public and Private Cloud Storage.
- Relational Databases
- MySQL: the most popular open source RDBMS. It represents the “M” in the LAMP stack. It is now owned by Oracle.
- Drizzle: A version of MySQL that is specifically targeted the cloud. It is currently an open source project without a commercial entity behind it.
- Percona: A MySQL support and consulting company that also supports Drizzle.
- PostgreSQL: aka Postgres is is an object-relational database management system (ORDBMS) available for many platforms including Linux, FreeBSD, Solaris, Windows and Mac OS X.
- Oracle DB – not used so much in new WebTech companies, but still a major database in the development world.
- SQL Server – Microsoft’ s RDBMS
- MongoDB: an open source, high-performance, database written in C++. Many Linux distros include a MongoDB package, including CentOS, Fedora, Debian, Ubuntu and Gentoo. Prominent users include Disney interactive media group, New York Times, foursquare, bit.ly, Etsy. 10gen is the commercial backer of MongoDB.
- Riak: a NoSQL database/datastore written in Erlang from the company Basho. Originally used for the Content Delivery Network Akamai.
- Couchbase: formed from the merger of CouchOne and Membase. It offers Couchbase server powered by Apache CouchDB and is available in both Enterprise and Community editions. The author of CouchDB was a prominent Lotus Notes architect.
- Cassandra: A scalable NoSQL database with no single points of failure. A high-scale, key/value database originating from Facebook to handle their message inboxes. Backed by DataStax, which came out of Rackspace.
- Mahout: A Scalable machine learning and data mining library. An analytics engine for doing machine learning (e.g., recommendation engines and scenarios where you want to infer relationships).
- Hadoop ecosystem
- Hadoop: An open source platform, developed at Yahoo that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. It is particularly suited to large volumes of unstructured data such as Facebook comments and Twitter tweets, email and instant messages, and security and application logs.
- MapReduce: a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in parallel on large clusters of commodity hardware in a reliable, fault-tolerant manner. Hadoop acts as a platform for executing MapReduce. MapReduce came out of Google
- HDFS: Hadoop’s Distributed File system allows large application workloads to be broken into smaller data blocks that are replicated and distributed across a cluster of commodity hardware for faster processing.
- Major Hadoop utilities:
- HBase: The Hadoop database that supports structured data storage for large tables. It provides real time read/write access to your big data.
- Hive: A data warehousing solution built on top of Hadoop. An Apache project
- Pig: A platform for analyzing large data that leverages parallel computation. An Apache project
- ZooKeeper: Allows Hadoop administrators to track and coordinate distributed applications. An Apache project
- Oozie: a workflow engine for Hadoop
- Flume: a service designed to collect data and put it into your Hadoop environment
- Whirr: a set of libraries for running cloud services. It’s ideal for running temporary Hadoop clusters to carry out a proof of concept, or to run a few one-time jobs.
- Sqoop: a tool designed to transfer data between Hadoop and relational databases. An Apache project
- Hue: a browser-based desktop interface for interacting with Hadoop
- Cloudera: a company that provides a Hadoop distribution similar to the way Red Hat provides a Linux distribution. Dell is using Cloudera’s distribution of Hadoop for its Hadoop solution.
- Solr: an open source enterprise search platform from the Apache Lucene project. Backed by the commercial company Lucid Imagination.
- Elastic Search: an open source, distributed, search engine built on top of Lucene (raw search middleware).
Pau for now…
December 18, 2011
Besides interviewing a bunch of people at Hadoop World, I also got a chance to sit on the other side of the camera. On the first day of the conference I got a slot on SiliconANGLE’s the Cube and was interviewed by Dave Vellante, co-founder of Wikibon and John Furrier, founder of SiliconANGLE.
-> Check out the video here.
Some of the ground we cover
- How Dell got into the cloud/scale-out arena and how that lead us to Big Data
- (2:08) The details behind the Dell|Cloudera solution for Apache Hadoop and our “secret sauce,” project crowbar.
- (4:00) Dell’s involvement in and affinity for open source software
- (5:31) Dell’s interest in and strategy around courting developers
- (7:35) Dell’s strategy of Make, Partner or Buy in the cloud space
- (11:10) How real is OpenStack and how is it evolving.
Pau for now…
November 29, 2011
As I mentioned in my previous entry, the code for the Hadoop barclamps is now available at our github repo.
To help you through the process, Crowbar lead architect Rob Hirschfeld has put together the two videos below. The first, Crowbar Build (on cloud server), shows you how to use a cloud server to create a Crowbar ISO using the standard build process. The second, Advanced Crowbar Build (local) shows how to build a Crowbar v1.2 ISO using advanced techniques on a local desktop using a virtual machine.
Crowbar Build (on cloud server)
Advanced Crowbar Build (local)
Pau for now…