This is the last in my three-part Web Glossary series. As I previously explained, in compiling this I pulled information from various and sundry sources across the Web including Wikipedia, community and company web sites and the brain of Cote.
The idea behind the glossary is to help our teams get a better understand of the wild and wacky world of the Web and Web developers as we move forward with our Web|Tech vertical. I figured I might as also share it with a few friends.
Today’s focus, having worked our way down from the top, is the infrastructure tier (with a short catch-all bucket at the end , “Misc.”)
Infrastructure
General Terms
DevOps: The goal of the DevOps movement is to drive out inefficiency in web shops by bridging the gap (and lessening conflict) between traditional development activity and operations activity. It seeks to address this issue by providing tools and practices to bring these two groups closer together and provide for greater automation of processes. Key tools in this effort are Opscode’s Chef and Puppet lab’s Puppet which automate the set-up and management of infrastructure.
PUE: Power Usage Effectiveness is a measure of how efficiently a computer data center uses its power; specifically, how much of the power is actually used by the computing equipment (in contrast to cooling and other overhead). PUE is the ratio of total amount of power used by a computer data center facility to the power delivered to computing equipment. The closer to 1.0, the better the PUE.
Distributed management: refers to the setup, provisioning, maintenance and management of the scale-out infrastructure (either physical or virtual) that has historically been characteristic of web firms and is increasing typical within traditional enterprise customers. This includes players like Chef and Puppet for provisioning and configuration, New Relic and Splunk for monitoring and management, and Loggly/Eucalyptus/OpenStack/ VMware for management monitoring.
Projects/Entities
Crowbar: Crowbar is a Dell-developed open source software framework designed to speed up the installation and configuration of open source cloud software onto bare metal systems. By automating the process, Crowbar can reduce the time needed for installation from days to hours. The software is modular in design so while the basic functionality is in Crowbar itself, “barclamps” sit on top of it to allow it work with a variety of projects. There have been barclamps built for OpenStack, Hadoop, CloudFoundry and Dreamhost.
Ubuntu: The most popular desktop linux distribution. On the server side they are supporting OpenStack and have an offering called the Ubuntu Enterprise Cloud. Backed by the commercial company Canonical.
Puppet: a configuration management tool designed to automate the set up and management of infrastructure. A key DevOps tool. It is produced by Puppet labs
Chef: a configuration management tool designed to automate the set up and management of infrastructure. A key DevOps tool. It is produced by Opscode, who hosts a cloud-based version of Chef called the Opscode Platform.
Nagios: a popular open source computer system and network monitoring software application. It watches hosts and services, alerting users when things go wrong and again when they get better.
Ganglia: an open source scalable distributed monitoring system for high-performance computing systems such as clusters and grids.
Misc
LAMP stack: Open source stack that provides a viable general purpose web server. The name comes from the first letters of its components: Linux, Apache web server, MySQL and PHP (or Perl or Python). LAMP has become a de facto development standard and is an excellent example of how open source software has made its way into enterprise environments through unofficial channels.
Apache Software Foundation: A decentralized group of developers that produce open source software under the Apache license. Notable projects include: Apache web server, Hadoop, CouchDB, Cassandra, Tomcat, Subversion
Nginx: an open source web server that recently has been gaining considerable traction
Recipes: They encapsulate collections of software resources which are executed in the order defined to configure a system.
Here is part two of three of the Web glossary I complied. As I mentioned in my last two entries, in compiling this I pulled information from various and sundry sources across the Web including wikipedia, community and company web sites and the brain of Cote.
Enjoy
General terms
Structured data: Data that can be organized in a structure e.g. rows or columns so that it is identifiable. The most universal form of structured data is a database like SQL or Access.
Unstructured data: Data that has no identifiable structure. Unstructured data typically includes bitmap images/objects, text and other data types that are not part of a database. Most enterprise data today can actually be considered unstructured. An email is considered unstructured data.
Big Data: Data characterized by one or more of the following characteristics: Volume – A large amount of data, growing at large rates; Velocity – The speed at which the data must be processed and a decision made; Variety – The range of data, types and structure to the data
Relational Databases (RDBMS) Management Systems: These databases are the incumbents in enterprises today and store data in rows and columns. They are created using a special computer language, structured query language (SQL), that is the standard for database interoperability. Examples: IBM DB2, MySQL, Microsoft SQL Server, PostgreSQL, Oracle RDBMS, Informix, Oracle Rdb, etc.
NoSQL: refers to a class of databases that 1) are intended to perform at internet (Facebook, Twitter, LinkedIn) scale and 2) reject the relational model in favor of other (key-value, document, graph) models. They often achieve performance by having far fewer features than SQL databases and focus on a subset of use cases. Examples: Cassandra, Hadoop, MongoDB, Riak
Recommendation engine: A recommendation engine takes a collection of frequent itemsets as input and generates a recommendation set for a user by matching the current user’s activity against the discovered patterns. The recommendation engine is on-line process, therefore its efficiency and scalability are key, e.g. people who bought X often also bought Y.
Geo-spatial targeting: the practice of mapping advertising, offers and information based on geo location.
Behavioral targeting: a technique used by online publishers and advertisers to increase the effectiveness of their campaigns. Behavioral targeting uses information collected on an individual’s web-browsing behavior, such as the pages they have visited or the searches they have made, to select which advertisements to display to that individual.
Clickstream analysis: On a Web site, clickstream analysis is the process of collecting, analyzing, and reporting aggregate data about which pages visitors visit in what order – which are the result of the succession of mouse clicks each visitor makes (that is, the clickstream). There are two levels of clickstream analysis, traffic analysis and e-commerce analysis.
Projects/Entities
Gluster: a software company acquired by Red Hat that provides an open source platform for scale-out Public and Private Cloud Storage.
Relational Databases
MySQL: the most popular open source RDBMS. It represents the “M” in the LAMP stack. It is now owned by Oracle.
Drizzle: A version of MySQL that is specifically targeted the cloud. It is currently an open source project without a commercial entity behind it.
Percona: A MySQL support and consulting company that also supports Drizzle.
PostgreSQL: aka Postgres is is an object-relational database management system (ORDBMS) available for many platforms including Linux, FreeBSD, Solaris, Windows and Mac OS X.
Oracle DB – not used so much in new WebTech companies, but still a major database in the development world.
SQL Server – Microsoft’ s RDBMS
NoSQL Databases
MongoDB: an open source, high-performance, database written in C++. Many Linux distros include a MongoDB package, including CentOS, Fedora, Debian, Ubuntu and Gentoo. Prominent users include Disney interactive media group, New York Times, foursquare, bit.ly, Etsy. 10gen is the commercial backer of MongoDB.
Riak: a NoSQL database/datastore written in Erlang from the company Basho. Originally used for the Content Delivery Network Akamai.
Couchbase: formed from the merger of CouchOne and Membase. It offers Couchbase server powered by Apache CouchDB and is available in both Enterprise and Community editions. The author of CouchDB was a prominent Lotus Notes architect.
Cassandra: A scalable NoSQL database with no single points of failure. A high-scale, key/value database originating from Facebook to handle their message inboxes. Backed by DataStax, which came out of Rackspace.
Mahout: A Scalable machine learning and data mining library. An analytics engine for doing machine learning (e.g., recommendation engines and scenarios where you want to infer relationships).
Hadoop ecosystem
Hadoop: An open source platform, developed at Yahoo that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. It is particularly suited to large volumes of unstructured data such as Facebook comments and Twitter tweets, email and instant messages, and security and application logs.
MapReduce: a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in parallel on large clusters of commodity hardware in a reliable, fault-tolerant manner. Hadoop acts as a platform for executing MapReduce. MapReduce came out of Google
HDFS: Hadoop’s Distributed File system allows large application workloads to be broken into smaller data blocks that are replicated and distributed across a cluster of commodity hardware for faster processing.
Major Hadoop utilities:
HBase: The Hadoop database that supports structured data storage for large tables. It provides real time read/write access to your big data.
Hive: A data warehousing solution built on top of Hadoop. An Apache project
Pig: A platform for analyzing large data that leverages parallel computation. An Apache project
ZooKeeper: Allows Hadoop administrators to track and coordinate distributed applications. An Apache project
Oozie: a workflow engine for Hadoop
Flume: a service designed to collect data and put it into your Hadoop environment
Whirr: a set of libraries for running cloud services. It’s ideal for running temporary Hadoop clusters to carry out a proof of concept, or to run a few one-time jobs.
Sqoop: a tool designed to transfer data between Hadoop and relational databases. An Apache project
Hue: a browser-based desktop interface for interacting with Hadoop
Cloudera: a company that provides a Hadoop distribution similar to the way Red Hat provides a Linux distribution. Dell is using Cloudera’s distribution of Hadoop for its Hadoop solution.
Solr: an open source enterprise search platform from the Apache Lucene project. Backed by the commercial company Lucid Imagination.
Elastic Search: an open source, distributed, search engine built on top of Lucene (raw search middleware).
As I mentioned in my last post, one of the ways we are helping our teams get a better understanding of the wild and wacky world of the Web and Web developers is via a glossary we’ve created. In compiling this I pulled information from various and sundry sources across the Web including wikipedia, community and company web sites and the brain of Cote.
Over the next several entries I will be posting the glossary. Feel free to bookmark it, delete it, offer corrections, comments or additions.
Today I present to you, the Application tier.
enjoy
General terms
Runtime: A programming language e.g. Java, .NET, JavaScript, PHP, Python, Ruby…
Application framework : Provides re-usable templates, methods, and ways of programming applications. Often, these frameworks will provide “widgets” and “libraries” that developers use to create various parts of their application – they may also include the actual tools to create, deploy, and run the final application. Some application frameworks create whole sub-cultures of developers, such as Rails which supports the Ruby programming language. Most application frameworks are open source and free, though there are also many closed source, not-free ones.
Continuous code development lifecycle: releasing software at more frequent intervals (30 days or less) by (a.) doing smaller batches of code, and, (b.) using tools and processes that enable a more lean approach to development. Software released in such a cycle tends to release many small features instead of, in contrast, “traditional” development where 100s of features are bundled up in one version of the software and released every 1-2 years.
Programming languages
Java/.NET: The incumbent enterprise development languages. Very powerful but relatively difficult to learn and take time to program in.
Dynamic languages: e.g. PHP, Perl, Python, JavaScript, and Ruby. They are popular for creating web applications since they are both simpler to learn and faster to code in than traditional enterprise standards like Java. This offers a substantial time to market advantage, particularly for smaller projects for which the benefits of Java are less applicable.
PHP: a server-side scripting language originally designed for web development to produce dynamic web pages. WordPress is written in PHP, as well as Facebook and countless web sites. PHP is infamous for being very quick and easy to get started with (which it is) but turning into a mess of “spaghetti code” after years of work and different programmers. PHP is open source, though Zend, the patron company behind PHP, and others sell “commercial” versions.
Perl: One of the original programming languages of the web, Perl emphasizes a very “Unix way” of programming. Perl can be quick and elegant, but like PHP can result in a pile of hard to maintain code in the long term. While Perl was extremely popular in the first Internet bubble, it has sense taken a back-seat to more popular development worlds such as PHP, Java, and Rails. Perl is open source and there are few, if any, commercial companies behind it.
Python: Like all dynamic languages, Python emphasizes speed of development and code readability. Its an object-oriented language. Python is something of an evolution of Perl, but it not that closely tied to it. Python emphases broadness of functionality while at the same time being a proper, object oriented programing language (not just a way to write “scripts”). Python enjoys steady popularity; Google uses Python as one of its primary programming languages.
JavaScript: once a minor language used in web browsers, JavaScript has become a stand-alone language on its own known and used by many programmers. Most web applications will include the use of JavaScript.
Ruby: Ruby and Python are very similar in ethos: emphasizing fast coding with a more human-readable syntax. Ruby became famous with the rise of Rails in the mid-2000s which was a rebellion against the “heavy weight” practices that Java imposed on web development. Ruby is still very popular. Ruby can also be run on-top of the Java virtual machine (via JRuby), providing a good bridge to the Java world. Salesforce’s acquired PaaS, Heroku, uses Ruby, and most modern development platforms use Ruby.
Ruby on Rails: a popular web application framework written in Ruby. Rails is frequently credited with making Ruby “famous”.
Scala: A somewhat exotic language, but it has quite a buzz around it. It’s good for massive scale systems that need to be concurrent (lots of people changing lots of things, often the same things, at the same time). Erlang is another language in this area. Scala runs on the Java Virtual Machine and Common Language Runtime. In April 2009 Twitter announced they had switched large portions of their backend from Ruby to Scala and intended to convert the rest. In addition, Foursquare uses Scala and Lift (Lift is a framework for Scala much in the same way Rails is a framework for Ruby.)
R: a programming language and software environment for statistical computing and graphics.
Node.js: (aka “Node”) What’s interesting about Node.js is the idea that it is taking JavaScript which was originally designed to be used in web browsers and using it as a server-side environment. It is intended for writing scalable network programs such as web servers. It was created by Ryan Dahl in 2009, and its growth is sponsored by Joyent, which employs Dahl.
Clojure: A recent dialect of the Lisp programming language and is good for data intense applications. It runs on the Java Virtual Machine and Common Language Runtime
Runtimes and Platforms
Common Language Runtime (CLR): is the virtual machine component of Microsoft’s .NET framework and is responsible for managing the execution of .NET programs.
Java Virtual Machine (JVM) – the underlying execution engine that the Java language runs on-top of. It controls access to the hardware, networks, and other “infrastructure” and services outside of the main application written in Java. Of special note is that many languages other than Java can run on the JVM (as with the CLR), e.g., Scala, Ruby, etc. There are many JVMs and ISVs (IBM, Oracle, etc.) will use their custom JVMs as key differentiators for middle ware, mostly around performance, scale-out, and security.
Projects/Entities
Openshift: Red Hat’s Platform as a Service (PaaS) offering. More specifically, OpenShift is a PaaS software layer that Red Hat runs and manages on top of third party providers – Amazon first with more to follow.
Heroku: A Platform as a Service (PaaS) offering that was acquired by Salesforce.com. It supports development of Ruby on Rails, Java, PHP and Python.
CloudFoundry: A Platform as a Service (PaaS) offering and VMware-led project. Cloud Foundry provides a platform for building, deploying, and running cloud apps using the Spring Framework for Java developers, Rails and Sinatra for Ruby developers, Node.js and other JVM languages/frameworks including Groovy, Grails and Scala.
Joyent: Offers PaaS and IaaS capabilities through the public cloud. Dell resells this capability as turnkey solution under the name The Dell Cloud Solution for Web applications. Joyent also sponsors the development of node.js and employs its creator.
GitHub: a web-based hosting service for software development projects that use the Gitrevision control system. GitHub offers both commercial plans and free accounts for open source projects.
But wait there’s more…
Stay tuned for the next couple of entries when I will cover first the Database tier and then the Infrastructure tier.
A couple years back, on the Public side of the house, Dell set up specific marketing teams to focus on customer needs in three areas: Healthcare, Government and Education. This vertical approach turned out to be a great way to get to better know our customers and their pain points and ultimately meet their needs.
Based on this success, a little while ago we kicked off a similar effort in our commercial business. The first six verticals we are setting up are: Retail, Manufacturing, Financial Services, Web|Tech, Energy and TME (Telco, Media & Entertainment). Web|Tech is the group I belong to (I lead marketing for the group).
Developers, Developers, Developers
In the Internet space we have already had a fair amount of success through our DCS group. The idea with the new Web vertical is to learn even more about the customer set, companies that use the internet as their platform, and take this knowledge along with our accumulated experience, to a wider audience. Two of the key areas of focus of this new vertical will be developers and open source software.
Look it up
One of the ways we are helping our teams get a better understand of the wild and wacky world of the Web and Web developers is via a glossary we’ve created. In compiling this I pulled information from various and sundry sources across the Web including wikipedia, community and company web sites and the brain of Cote.
The glossary is organized into the following sections:
[Update Feb 1: I've gone back and linked the entries below]
As I mentioned in my last entry, Mark Shuttleworth of Ubuntu fame stopped by Dell this morning on his way back from CES. Between meetings Mark and I did a couple of quick videos. Here is the second of the two. Whereas the first focused on the client, this one focuses on the Cloud and the back-end.
Mark Shuttleworth, founder of Ubuntu Linux and Chairman of Canonical the commercial distribution behind Ubuntu, stopped by Dell for a bunch of meetings this morning. Mark was visiting Austin on his way back from CES in Las Vegas where he and the team just unveiled Ubuntu TV.
I was able to grab a few minutes with Mark between meetings and get his thoughts on a bunch of topics. Here is the first of two videos we did. You’ll notice that this one ends a bit abruptly, that’s because we got booted out of the conference room we were squatting in. You’ll also notice when I post the second video that we found a much better location for round two.
Some of the ground Mark covers
How was CES and how was Ubuntu TV received?
What is the secret sauce behind Ubuntu TV and how is it different than Google TV
What is Ubuntu One and how is it different than Apples iCloud or Microsoft’s skydrive?
What is Unity an how it ties together the client experience together across devices.
This afternoon Matt Ray, Technical Evangelist for Opscode, stopped by Dell’s Round Rock HQ to brief a gaggle of folks on what they are up to. Cote arranged the visit as well as one last month with Puppet labs, which I unfortunately wasn’t able to make.
After Matt, with some help from teammates on the phone, briefed the Dell gang I grabbed some time with him to get the 5 minute Reader’s Digest version. Here is the result.
Some of the ground Matt covers:
What are Opscode and Chef?
How did they come to be?
The hosted version of Chef (moving from EC2 to Rackspace)
Here is the last in a series of three short videos around cloud computing put together by Dell and Intel. As I mentioned in the last two entries, these videos are part of larger series around key topics like IT reinvention, the consumerization of IT, social media etc.
This last video features myself, Dell’s former CIO Robin Johnson, VP of Dell’s Enterprise Solutions and Strategy, Praveen Asthana and Donna Troy, VP and GM of Solutions Marketing and Sales at Dell.
Some of the ground we cover
How we define cloud computing
How quickly can you evolve to cloud?
How do you balance your current environment with cloud
Starting your cloud building from a basis of virtualization
Before the holidays I posted the first of three videos that Dell and Intel put together around cloud computing. These videos are part of a larger series around key topics like IT reinvention, the consumerization of IT, social media etc.
This second video features myself, Dell’s VP of Platform marketing Sally Stevens and John Pereira, Intel’s director of data center and hosting.
Some of the ground we cover
Cloud as a component of a larger portfolio of compute models
Small companies and the power of the cloud (Animoto case study)
How much of IT spend goes towards maintenance and how can we lower this
The WordPress.com stats helper monkeys prepared a 2011 annual report for this blog.
Here’s an excerpt:
The concert hall at the Sydney Opera House holds 2,700 people. This blog was viewed about 46,000 times in 2011. If it were a concert at Sydney Opera House, it would take about 17 sold-out performances for that many people to see it.
Earlier this year Dell and Intel did a series of videos around key topics like cloud computing, IT reinvention, the consumerization of IT, social media etc. Within these there was a mini-series that dealt with cloud computing that I participated in.
Here is the first one that features Dell’s CIO Robin Johnson, John Pereira, Intel’s director of data center and hosting, Forrest Norrod who is the VP and GM of Dell’s server platform group and myself.
Some of the topics we hit on:
How cloud relates to grid compute
How start-ups and smaller companies leverage the cloud and how that may change as they grow
The benefit of velocity and near instantaneous deployment that cloud brings
The federal government’s “Cloud First” initiative and how that will promote adoption
Besides interviewing a bunch of people at Hadoop World, I also got a chance to sit on the other side of the camera. On the first day of the conference I got a slot on SiliconANGLE’s the Cube and was interviewed by Dave Vellante, co-founder of Wikibon and John Furrier, founder of SiliconANGLE.
Last but not least in the 10 interviews I conducted while at Hadoop World is my talk with Splunk‘s CTO and co-founder Erik Swan. If you’re not familiar with Splunk think of it as a search engine for machine data, allowing you to monitor and analyze what goes on in your systems. To learn more, listen to what Erik has to say:
Some of the ground Erik covers:
What is Splunk and what do they do?
(1:43) The announcement they made at Hadoop world about integrating with Hadoop and what that means.
(4:25) How Erik and Rob Das got the the idea to get involved in the wacky world of machine data and to create Splunk.
As I mentioned in my previous entry, the code for the Hadoop barclamps is now available at our github repo.
To help you through the process, Crowbar lead architect Rob Hirschfeld has put together the two videos below. The first, Crowbar Build (on cloud server), shows you how to use a cloud server to create a Crowbar ISO using the standard build process. The second, Advanced Crowbar Build (local) shows how to build a Crowbar v1.2 ISO using advanced techniques on a local desktop using a virtual machine.
Earlier this month we announced that Dell would be open sourcing the Crowbar “barclamps” for Hadoop. Well today is the day and the code is now available at our github repo.
Whats a Crowbar barclamp?
If you haven’t heard of project Crowbar it’s a software framework developed at Dell that started out as an installation tool for OpenStack. As the project grew beyond installation to include monitoring capabilities, network discovery, performance data gathering etc., the developers behind it, Rob Hirschfeld and Greg Althaus, decided to rewrite it to allow modules to plug into the basic Crowbar functionality. These modules or “barclamps” allow the framework to be used by a variety of projects. Besides the OpenStack and Hadoop barclamps written by Dell, VMware created a Cloud Foundry barclamp and DreamHost created a Ceph barclamp.
To help you get your bearings
As I mentioned in the opening paragraph, the code for the Hadoop barclamp is now available. To help you get started, below are a couple of videos that Rob put together. The first walks you through how to install Crowbar and the second one explains how to use Crowbar to deploy Hadoop.
I’m getting near the end of the interviews that I did while at Hadoop World earlier this month, just one more after this (with Splunk’s CTO and co-founder).
Today’s entry features a talk I had with Nosh Petigara, director of product strategy at 10gen, the company behind MongoDB.
Some of the ground that Nosh covers
Who is 10gen and what is MongoDB
(0:29) How does Nosh define NoSQL
(1:20) What use cases is Mongo best at
(2:14) Some examples of customers using Mongo (foursquare, Disney and MTV) and what they’re using it for
(3:08) How Mongo and Hadoop work together
(4:03) Whats in Mongo’s future that Nosh is excited about
Extra-credit reading
Mongo Conference: MongoSV (Dec 9 in Silicon valley)
Todd Papaioannou has been in Big Data for a while. He built the original engineering team at Greenplum, worked at Teradata for 5 years and mostly recently, before joining Battery Ventures as an Entrepreneur in Residence, served as Yahoo’s Chief Cloud architect.
I grabbed some time with Mr. P to learn what it means to be an EIR and what he’s seeing in the industry from his vantage point.
Some of the ground Todd covers
Todd’s background
(0:45) What is an Entrepreneur in Residence and how did Todd become one
(2:45) What trends is he seeing in the space and how does he feel the market’s evolving
(4:00) What are his big take aways from this year’s Hadoop World
But wait, there’s more!
Stay tuned for more interviews from last week’s Hadoop world. On tap are:
I’m always interested in what’s happening at Canonical and with Ubuntu. Last week at Hadoop World I ran into a couple of folks from the company (coincidentally both named Mark but neither Mr. Shuttleworth). Mark Mims from the server team was willing to chat so I grabbed some time with him to learn about what he was doing at Hadoop World and what in the heck is this “charming” Juju?
Some of the ground Mark covers
Making the next version of Ubuntu server better for Hadoop and big data
(0:34) What are “charms” and what do they have to do with service orchestration
(2:05) Charm school and learning to write Juju charms
(2:54) Where does “Orchestra” fit in and how can it be used to spin up OpenStack
(3:40) What’s next for Juju
But wait, there’s more!
Stay tuned for more interviews from last week’s Hadoop world. On tap are:
One thing Hadoop isn’t great at right out of the box is data analytics, that’s where a company like Karmasphere comes in. Karmasphere provides business intelligence software that data analysts can use to use to mine the data that Hadoop sucks up.
Last week at Hadoop World I grabbed some time with Karamsphere’s Chairman and co-founder, Martin Hall to learn more about where he and his company play in the wild world of big data.
Some of the ground Martin covers
Where does Karmasphere play in the big data stack, how is it used and by whom
(0:38) Where did the idea for developing Karmasphere come from
(1:58) What is the Karmasphere “secret sauce”
(2:18) What are the main industries and use cases where their offerings are used
(3:40) What can we look forward to in future releases
But wait, there’s more!
Stay tuned for more interviews from last week’s Hadoop world. On tap are: Mark Mims of Canonical, Todd Papaioannou from Battery Ventures, John Gray of Facebook, Erik Swan of Splunk and Nosh Petigara of 10gen/MongoDB.