New technology often comes with a new way of thinking. Usually, people are quick to embrace the exterior — the “shiny new object” component of new technology — but evolving a mindset is more difficult.
Developers and architects are no different.
For example, when the three pillars associated with Big Data first hit the scene (horizontal scaling, self-healing clusters and data locality), they required a shift in thinking.
Before horizontal scaling, organizations were used to vertical scaling: The 286 would get thrown out to make room for a 368, which was later discarded for a 486, which in turn was discarded for a Pentium and so on. Software works the same. Oracle Server becomes Oracle RAC and is later upgraded to Oracle Exadata. But horizontal scaling meant that adding capacity was not a matter of discarding servers for upgraded models. Increasing CPU, memory or disk capacity was simply a matter of adding a server to the cluster.
Today, many organizations prefer Hadoop to build computer clusters from commodity hardware, because it is an open-source software framework that assumes hardware failures are common occurrences and should be automatically handled by the framework.
But Big Data doesn’t have to mean just Hadoop anymore.
Getting Your Organization’s Head in the Cloud
Hadoop is fantastic, but it has its limitations. Some of the downsides to Hadoop include:
- You need a Hadoop administration group who can tweak the installation for performance. Hadoop nowadays comes with many components, which means it can get complicated fast.
- Your servers generally need to be up and running. Even if you aren’t using your servers heavily, you might end up still paying for them using Hadoop.
Many organizations, as a result, are moving to the cloud. It’s easy to scale up, and it takes care of some key voids within its infrastructure. For example, you would never have to worry about swapping out a hard disc. The technology makes a virtual cluster of computers available at all times.
But using the cloud for Big Data? For not just a few gigabytes, but petabytes of information?
The cloud is capable of handling many of these components.
Nine Big Data Components the Cloud Handles as Good as Hadoop
Nine Big Data components the cloud can handle just as well as Hadoop include:
- Data ingestion. Data ingestion is the process of obtaining and importing data for immediate use or storage in a database. Using the cloud, you can set up an API and connect to computer services that allow server-less code execution. Your organization would pay only for code execution time. Compare this to Hadoop, in which your servers generally need to be up and running.
- Data processing in real time. Almost every organization today demands data processing in real time: a constant input, process and output of data. Using the cloud, your organization can stream data in real time.
- Storing data. Data stored in the cloud is highly available and highly durable. Versioning can be turned on.
- Analyzing data. Your organization can analyze data in a variety of ways: regression, in which the output variable takes continuous values, or classification, in which the output variable takes class labels. Data can also be analyzed in batch or synchronous modes.
- Search. Elastic search opportunities in the cloud are designed for horizontal scalability, reliability and easy management.
- Scalability. Most cloud-based services would allow your organization to add resources to a cluster with the click of a button.
- Fault tolerance. One of the pillars of Big Data is that it be self-healing: Data is replicated all over the place. On a physical machine, if you were to blow away a cluster with a shotgun, the cluster would figure out what the machine was doing and save the data. Cloud-based services are no different. They can view infrastructure as code and thus be automated to react in the case of an event.
- Security. Policies can be set up for each resource with fine-grained access control. While this is arguably one area that could still call for a skilled administrator even in the cloud, security for your clusters can still be managed through the cloud itself.
- Cost. Cloud-based services aren’t like your gym membership. In general, you don’t have to sign up for an impossible-to-get-out-of contract, and you only pay for what you use.
Gone are the days where Big Data requires an onsite administrative team. With the cloud, your organization can achieve the same business insights at a lower price point. Daugherty can help you adapt to this new way of thinking.