How to Succeed with Big Data without an Admin Team

New technology often comes with a new way of thinking. Usually, people are quick to embrace the exterior — the “shiny new object” component of new technology — but evolving a mindset is more difficult.

Developers and architects are no different.

For example, when the three pillars associated with Big Data first hit the scene (horizontal scaling, self-healing clusters and data locality), they required a shift in thinking.

Before horizontal scaling, organizations were used to vertical scaling: The 286 would get thrown out to make room for a 368, which was later discarded for a 486, which in turn was discarded for a Pentium and so on. Software works the same. Oracle Server becomes Oracle RAC and is later upgraded to Oracle Exadata. But horizontal scaling meant that adding capacity was not a matter of discarding servers for upgraded models. Increasing CPU, memory or disk capacity was simply a matter of adding a server to the cluster.

Today, many organizations prefer Hadoop to build computer clusters from commodity hardware, because it is an open-source software framework that assumes hardware failures are common occurrences and should be automatically handled by the framework.

But Big Data doesn’t have to mean just Hadoop anymore.

Getting Your Organization’s Head in the Cloud

Hadoop is fantastic, but it has its limitations. Some of the downsides to Hadoop include:

  • You need a Hadoop administration group who can tweak the installation for performance. Hadoop nowadays comes with many components, which means it can get complicated fast.
  • Your servers generally need to be up and running. Even if you aren’t using your servers heavily, you might end up still paying for them using Hadoop.

Many organizations, as a result, are moving to the cloud. It’s easy to scale up, and it takes care of some key voids within its infrastructure. For example, you would never have to worry about swapping out a hard disc. The technology makes a virtual cluster of computers available at all times.

But using the cloud for Big Data? For not just a few gigabytes, but petabytes of information?

The cloud is capable of handling many of these components.

Nine Big Data Components the Cloud Handles as Good as Hadoop

Nine Big Data components the cloud can handle just as well as Hadoop include:

  1. Data ingestion. Data ingestion is the process of obtaining and importing data for immediate use or storage in a database. Using the cloud, you can set up an API and connect to computer services that allow server-less code execution. Your organization would pay only for code execution time. Compare this to Hadoop, in which your servers generally need to be up and running.
  2. Data processing in real time. Almost every organization today demands data processing in real time: a constant input, process and output of data. Using the cloud, your organization can stream data in real time.
  3. Storing data. Data stored in the cloud is highly available and highly durable. Versioning can be turned on.
  4. Analyzing data. Your organization can analyze data in a variety of ways: regression, in which the output variable takes continuous values, or classification, in which the output variable takes class labels. Data can also be analyzed in batch or synchronous modes.
  5. Search. Elastic search opportunities in the cloud are designed for horizontal scalability, reliability and easy management.
  6. Scalability. Most cloud-based services would allow your organization to add resources to a cluster with the click of a button.
  7. Fault tolerance. One of the pillars of Big Data is that it be self-healing: Data is replicated all over the place. On a physical machine, if you were to blow away a cluster with a shotgun, the cluster would figure out what the machine was doing and save the data. Cloud-based services are no different. They can view infrastructure as code and thus be automated to react in the case of an event.
  8. Security. Policies can be set up for each resource with fine-grained access control. While this is arguably one area that could still call for a skilled administrator even in the cloud, security for your clusters can still be managed through the cloud itself.
  9. Cost. Cloud-based services aren’t like your gym membership. In general, you don’t have to sign up for an impossible-to-get-out-of contract, and you only pay for what you use.

Gone are the days where Big Data requires an onsite administrative team. With the cloud, your organization can achieve the same business insights at a lower price point. Daugherty can help you adapt to this new way of thinking. For more information, contact us today.

Privacy by Design: Why It’s Critical to Your Organization

No surprise: Privacy is a big deal.

If your organization is storing data about people, privacy should be a big deal to you. This is especially true if you’re storing data about people in the European Union (EU), in light of General Data Protection Regulation (GDPR), which will go in effect May 25, 2018.

GDPR is designed to strengthen and unify data protection for all individuals within the EU and give control back to citizens over their personal data. Under GDPR, organizations in breach can be fined up to four percent of their annual global turnover (revenue) or €20 million (whichever is greater).

This is the maximum fine, which can be imposed for the most serious infringements, like not having sufficient customer consent to process data.

And it’s per infringement.

So what precisely does it mean to have data that aren’t properly anonymized?

The Look of Data that Can Incur Hefty Fines

Weak anonymization algorithms are one way of violating user privacy.

Remember Where’s Waldo? In the books, Waldo is hidden among a large crowd, and we are invited to pore over the pages, scanning for his trademark red-and-white-striped shirt, bobble hat and glasses. Knowing what to look for makes it slightly easier, although the books introduce red herrings to make Waldo more difficult to spot.

Imagine if an entire Where’s Waldo? illustration contained a mass of people all wearing dull green, surrounded by dull green landmarks, and Wally in his trademark red.

He’d be really easy to spot.

A GDPR infringement occurs if somebody can determine a person’s identity through data, even if anonymization algorithms are in place. Best intentions don’t matter.

One of the best methodologies your organization can institute to comply with GDPR is to adopt a Privacy by Design approach to your systems.

Privacy by Design: Where Outcomes, Not Intentions, Matter

Privacy by Design is an approach to systems engineering that takes privacy into account throughout the whole engineering process.

It’s not about data protection per se.

Rather, the system is engineered in such a way that it doesn’t need protection.

The root principle is based on enabling service without having the client become identifiable or recognizable.

Three examples of Privacy by Design include:

  1. Dynamic Host Configuration Protocol (DHCP). With DHCP, a server maintains a pool of IP addresses, and randomly assigns an IP address to a device. Because the IP address is “leased” to a device, it doesn’t leak personal identifiers about the person using the device.
  2. Global Positioning System (GPS). A GPS device doesn’t require you to transmit data; rather, it relies on signals transmitted from GPS satellites whose positions are known. Without leaking your identity or location, it can provide you your geographic location.
  3. Radio-Frequency Identification (RFID). As it pertains to the Internet of Things (IoT), RFID can act as the bridge between the physical and digital world. The RFID tag is preregistered with the host system to establish identification. Then the tag communicates only by broadcasting its ID.

Zero-knowledge proof is one way you can implement Privacy by Design. It is a means of establishing proof by using something other than personal identifiers.

For example, a gambling website may use a Facebook sign-in, which can guarantee proof of age by asking Facebook.

In another example, a risqué game in the 1980s might ask questions about baseball players that only an older audience would know. Of course, the questions didn’t prevent a baseball-prodigy youngster from gaining access.

How Privacy by Design is achieved depends on the application, technologies and choice of approach. Daugherty can anonymize your data to protect you from penalties through GDPR. We can also analyze it to determine opportunities in the data where your organization can expand. Contact us today.