Cloud Architecture: A Primer
Introduction
Cloud architecture describes the design behind software that relies on the Internet for on-demand services. They are inherently structured such that the underlying infrastructure is only accessed when necessary, for instance, to handle a user’s request.
The required resources, such as computing servers or storage devices, are then evoked for a certain job before the unnecessary resources are released back into the ecology. This approach ensures that the applications are scalable depending on ever-changing demands.
Here, we will look at an instance of cloud architecture that uses on-demand infrastructure to perform tasks as specified by thousands of individual users. The application calls upon hundreds of virtual servers which are constructed to compute in parallel with a distributed framework.
This process takes on readily a host of troubles surrounding very-large-scale processing tasks, since the time frame prescribed often translates into a shortage of physical infrastructure.
There is also a matter of distributing the workload and coordinating the actions of a large number of machines, since each effectively run a separate process and is not immune from failure. In such cases another machine must takes its place automatically so as to avoid obstructing the entire procedure. This challenge is particularly daunting when you are dealing with dynamic and ever-changing workloads.
On cloud architectures, the physical location of machinery is naturally irrelevant. All a developer has to concern about him or herself with is the APIs available for the myriad of Internet-accessible services on the market at the moment. Many of them are heavily involved in industrial solutions in which reliability is paramount even (or especially) in the face of complexity. Scalability should always be engineered into the genes from the very beginning, and ideally, an extraneous concern to the individual consumer.
We are all aware of the financial advantages cloud computing architectures can bring, so we will be brief here. The biggest boon comes from the fact that there are virtually zero upfront infrastructure costs that chimes well with modern-day just-in-time business solutions. Resources are more efficiently utilized; usage-based costing pleases our accountants; parallel processes promise enormous time savings once everything else is in place.
Some random examples
Before we proceed to a more structured examination of CC architecture, let’s just do some window-shopping. Plenty of applications could utilize its power, from back-office chore to savvy web applications. From a strictly consumerist perspective, which is naturally inadequate to the technically-inclined ones amongst us, we can split these architectures into three categories: pipes, batches and websites.
Processing pipelines can be imagined as a primary-production assembly line that churns out products 24/7. Much of the time, the end result is an intermediate good that waits for further processing before it can be sold to the consumer through such portals as search engines and multimedia sites.
- Document processing: these are curiously more applicable to individual private consumers than corporate clients, but times are a-changing. The pipeline converts millions of documents from proprietary formats such as Microsoft Word DOC to PDF, converts scanned pages into machine-readable text through OCR.
- Image processing: this usually means resizing, since search engines and limited global bandwidths have bestowed developers with an insatiable appetite for thumbnails and previews. The complementary measure is of course compression, but it exists for an entirely different reason due to its loss less nature.
- Video transcoding: as a compute-intensive process, it is one of the first CC architectures that have really impacted on our habits, as opposed to webmail which merely extended their previous paradigm.
- Indexing: jumping all over the world, amassing data, and feeding their distillate back to a variety of customers from the ubiquitous search engine to marketers and academics.
- Data mining: this is usually executed over the internet but you can elect for a small subset of (privately owned) data. In which case it brings us to the next category:
Batch processing is simply simultaneously handling a large number of similar tasks. This is usually implemented in office environments and certain software studios:
- Back-office applications are everywhere. Finance, insurance and retail all have vastly different requirements on the service, but the underlying CC architecture is usually eerily similar, which speaks volume for these industries.
- Log analysis: enterprise applications generate logs for everything, from sales to asset management to how many times a user attempted to login but failed.
- Software development: nightly builds need to be compiled from slightly modified source code every night, as the name suggests. Automated unit/deployment testing takes on a number of benchmarks on different deployment configurations.
Websites are probably the venue where most of us first encountered CC architecture, albeit without knowing it.
- Some websites auto-scale during the day if they have a stream of traffic that follows a rough daily pattern.
- Others are not so fortunate. Major sports tournament has to be engineered around a huge, one-off or at most annual, spike of demand. Frequently they need to stream live video as well, which is a veritable challenge even for CC.
- In the middle are seasonal websites that might be marketing holiday wears or tax consultants.
From this string of examples two distinct definitions of CC architecture begin to emerge. There is an engineer’s perspective, which simply describes the set-up as a massive number of similarly configured computing devices, all wired up, scalable, and principally communicating via an IP network. For the executive, it’s scalable and billed per usage.
The key to differentiating the case is finding out whether a technology could be serviced under a different architecture, so the cloudiness comes as an adjunct but not an integral component. The relationship between the services themselves and the CC hoopla must be scrutinized before we can really tag it with labels.
According to both NIST and Wikipedia, we can go about it in three directions: Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). This perspective accords with ITIL discussion of IT infrastructure on the whole, since it is similarly drawn around a set of services. As will become apparent, many of these services are currently deployed in non-cloud architectures as well, since organizations afflicted with sticky labor sometimes cannot afford to outsource portal services, for instance, to an external provider unless there is a way of letting people go.
Delivery Models
The delivery model describes the point where the user comes into contact with the services, so these three form a natural stack. Platforms require an infrastructure, and software in turn needs a platforms.
SaaS interacts directly with the consumer through the provider’s application on a cloud infrastructure. Configuration at the user end is severely limited as a result.
PaaS includes but is not limited to virtual machines and hardware as long as the consumer can deploy his or her own application, purchased off the shelf or developed in the cloud’s language. APIs and sandboxes are frequently supported by the provider as a bonus to lure custom. Since clouds are not designed with tie-ins in mind, whether hardware or OS specific, the languages are chosen with portability in mind. Python, Java, PHP, Perl and .Net are popular choices.
As a support layer for SaaS, database services are thrown into the bunch as well. Though the consumer doesn’t have a say in virtualization systems, server configuration, network capabilities, or storage solutions, there is always some leeway in how they choose to operate their application hosting environment.
IaaS is the last frontier. Users can actually interact with the infrastructure, even though they are inevitably abstracted and provisioned by the service provider. The ability to acquire all the fundamentals in computing, from processing, storage, network, to memory access are available here, but the actual implementation is performed by the other party as stipulated by service level agreements. Anything from OS to apps and miscellaneous snippets can be run here. Consumers do not manage or control actual infrastructure, just the virtual instances and possibly some networking components such as firewalls and load balancer.
Infrastructure as a Service (IaaS)
- Server Hosting: The illusion is that you own your own data center. Hosting a cloud server means that you have access to the infrastructure for tasks where fine control is of utmost importance, because you don’t build a road if you could just take a ride elsewhere. For instance, reliability can be ensured with mirroring and multiple virtual instances, done through no interrupting automation processes.
- Operating System: Azure being the prodigal son. Deploy applications and data into the cloud technically means that it is a PaaS, but then even IaaS needs its own OS. In any case you have an OS that provides scalable compute and storage facilities. Resellers such as Dell and HP are now offering cloud services based on a variant called the Azure Platform Appliance.
- Virtual Instances: the Amazon EC2 for true virtual computing environment through web service interfaces to launch instances within different OS. Load custom application environments, manage, and run: it makes no difference to EC2. You can either use a template or create your own machine image.
- CDN: content delivery/distribution network depending on whom you ask. Essentially your cloud contains copies of data placed at various nodes in the network so bandwidth for access can be maximized. No centralization, no centralized bottlenecks. Any content will do: web objects, downloadable objects, real-time video streams, and miscellaneous internet deliveries that are less glamorous but far more important, such as DNS, routes, and database queries.
- Info Sharing: another thing that strictly speaking doesn’t belong to IaaS at all. But take electronic medical records solution, as well as billing and practice management software, and you will see the need for this varied and mutually incompatible software to have one digital repository for information. The data collection and processing occurs at a very low level for something that seems very close to the user end.
- Web Servers: you can host one on Amazon web services if needs must. You manage a virtual machine, so with a little server management system, get key pairs, PEMs, define access, launch an instance, and hook up DNS and domain names, set up a server. Tomorrow you could receive a full billion visitors and nothing, save your bank account, would break.
- Web Services: when two electronic devices talk over a network, it’s a web service. So basically, REST-compliant ones, where machines manipulate XML representations of web resources with stateless operations, and everything else.
- Storage: networked online storage with multiple virtual servers, usually hosted by third parties. Hosting by large data centers necessitates virtualization since customer demand becomes relatively small. Usually accessed through a web-based user interface or an API for machine to machine.
- Remote access: sometimes remote desktop software can be housed on thumb drives; the desktop is recreated via a connection to the cloud, so there’s no longer a need to keep the local computer switched on. Of course, the trend is to abolish its existence altogether, but it’ll be some time before it actually happens.
- Load balancing: the technique to split work between two or more devices, whether they are storage ones, processors, or bandwidth. The point is to get the most out of the least, and hopefully stop bottlenecks or outright crashes. Reliability through redundancy is harder than it sounds, so it needs its own program and/or devices.
- X.509: the public-key infrastructure for both single sign-on and privilege management infrastructure, which stipulates the formats for public key certificates, and revocation lists amongst other things.
Platform as a Service (PaaS)
- Session State Management: as with ASP where a user’s session is tracked with Session State. Since web apps are implemented on HTTP, a stateless protocol, every request is treated separately as if they belonged to different users. So as long as a user’s session is alive, there needs to be a way to store its specific data for further access.
- Sand- box: it’s a novel environment and few developers have the luxury of waiting weeks for rack-and-stack testing environments. So play in a sandbox for demos, testing, evaluation and proof-of-concept. Also now increasing invaluable as training aids.
- Knowledge Management: a concept from which enterprise applications can harvest, process and store content. The layers include a store for rapid access, a mart for intermediate knowledge repository, and archives in the warehouse. This seemingly ill-defined blob of bizspeak is now actively pursued in content management, enterprise search, discussion forums and delivery portals.
- Data Mart: Similarly there are ways to deal with more arid forms of data, especially transactional ones such as purchase records. They are pulled from the sources into a warehouse, aggregated, and reported to business users. ETL, database, reporting and occasionally modeling tools come into play. A warehouse is the reporting venue itself, where addition operations may take place and three tiers are in play: staging for the developer’s raw meat, integration for medium-level abstractions, and access for end-users.
- Content Management: Traditional ECM is necessarily static, terminal-dependent, top-down and generally useless unless you’re rich enough to customize it. Cloud based content management is egalitarian unless you are dealing with humongous amounts of content.
- UDDI: an XML based registry for businesses so they can list themselves on the Internet and Locate web service apps. Open industry and thus platform-independent, UDDI allows virtually anyone to publish and discover one another. It interrogates by SOAP and provides access to WSDL documents describing the interaction requirements.
Software as a service (SaaS)
A. Web 2.0 Apps
- Blogs: really just a website. As relevant to clouds as GeoCities was.
- Metadata Management: storing information about other information. URLs, images, videos.
- Wiki Services: really just another website, but with online database. In large collaborative efforts, multi tenancy becomes a must.
- Social Networking: yet another website, but possibly providing a front-end for CC in the near future.
B. Enterprise Services
- Workflow management: through CC, as opposed to in CC, the latter of which being a large and interesting topic.
- Forms Management, Human Resources. Groupware, Collaboration, Financial Management: necessarily complex in multinational corporations and occasionally a suitable candidate for CC architecture solutions. However, SMEs are now also presented with an opportunity since licensing costs are plummeting as a result of cloud competitors.
- Geo-spatial and CRM: also necessarily multi-tenant as demand expands without limit.
- Digital Asset Management, CRM, Desktop Software and Process Automation Services: these present the opportunity to access the same commerce platform independent of device configuration.
A word on security
The first thing you touch in a new house is usually its locks and doorknobs, and the same might be said of CC. Without security provisions, not even infrastructure can be reliably piped through, let alone platforms or software. Note, however, that this is by no means a hierarchy, so you can have SaaS as a standalone provision so long as it comes with its own set of lock-and-key.