Wednesday, December 19, 2018

Introduction to Algorithms

What is an Algorithm?

In real world, there are problems for which sometimes solutions are available and sometimes you must find the solutions. An algorithm is a process of performing certain operations to reach a solution. It is a blueprint of the approach you will follow to solve a problem. But it is possible that there may be multiple solutions to a problem. How do you select an algorithm in such a case? Is every proposed algorithm implemented or do some criteria need to be checked? To implement an algorithm, firstly, you check its correctness, and then, you evaluate it on certain other metrics. Otherwise, there could be a lot of solutions to a problem, and their implementations may cost you in terms of more memory space or time consumption.

Asymptotic notations, Time complexity, and Space complexity

What are time complexity, space complexity and asymptotic notations? Time complexity is the quantification of the amount of time an algorithm takes to run and space complexity is the quantification of the amount of additional space an algortihm takes to run. Asymptotic notations are used to analyze the running time of an algorithm.
 Three types of asymptotic notations are used to calculate the running time complexity of any algorithm. They are:
1.       Ο Notation (Big O): The Ο(n) notation expresses the upper bound of an algorithm's running time or the worst case time complexity.
2.       Ω Notation (Omega): The Ω(n) notation expresses the lower bound of an algorithm's running time or the best case time complexity.
3.       θ Notation (Theta): The θ(n) notation expresses both the lower bound as well as the upper bound of an algorithm's running time.
We generally use the Big O notation only for expressing the time complexity of algorithms, as we tend to find the worst case scenario of an algorithm with respect to the input size (n).
Consider an example where you must find an element in an array. You are given an ‘inputArray’, and an element in a variable named ‘toFind' that needs to be searched in the inputArray. How will you find the element? One of the simplest approach is to start with the first element of the ‘inputArray’ and check it with the element in the ‘toFind’ variable. If both match, you break and print ”Element Found!” otherwise you move to the next element and repeat the steps above. To iterate through all the elements, you write a loop that starts from the 0th index and runs till the (n−1)th index.

So the number of time it runs is ‘n’ and the time complexity is equal to the number of times a code segment runs. This results in the time complexity being O(n). This is known as worst-case complexity. If the element is found at index '0', then the for loop runs for a single time. In such case, the time complexity is Ω(1). This is known as the best-case complexity.

int [] inputArray = {5,4,78,1,12,14,10,8,7};
int toFind = 10;
for(int i=0;i<inputArray.length;i++){  
       if(inputArray[i]==toFind){             
              System.out.println("Element Found");               
              break;
       }
}
Consider another example where you need to find the maximum element in an array; so you start with the first element. Since only a single element is traversed, you consider it as the maximum element and store its value in the ‘maximum’ variable. Now you compare the maximum with the next element in the array. If the ‘maximum’ is greater than the element then we again move to the next element because the maximum is greater than all the elements traversed till now; otherwise, you make the element as the maximum. This code segment runs for all the n elements for both the best-case and worst-case scenarios and hence the time complexity becomes θ(n).

int [] inputArray = {5,4,10,8,7,78,1,12,14,};
int maximum = inputArray[0];
for(int i=1;i<inputArray.length;i++){
       if(inputArray[i]>maximum){
              maximum = inputArray[i];  
       }
}
System.out.println("The maximum element is: " +maximum);

Let's know more about space complexity. Consider the Fibonacci number problem. The Fibonacci numbers are the numbers that follows an integer sequence.A Fibonacci number k, Fib(k), can be calculated using Fib(k-1)+Fib(k-2) where the boundary condition values are F(0) = 0 and F(1) = 1.

Suppose you need Fib(9) and you have to store all the numbers from Fib(0) to Fib(9) also. For that you need to store all the Fibonacci numbers. You can either create ten variables to store these or can declare an array of size ten to store all the numbers.



Now, you can iterate for each number and calculate its value from the previous two fibonacci number as

fibArray[i] = fibArray[i-1] + fibArray[i-2];



Towards the end, when you find the Fib(9), you have a list of Fibonacci numbers till nine. However, in calculating the ninth Fibonacci number you have opted for an array of size of the Fibonacci number, that you are calculating. The array becomes the additional space of the program and thus making the space complexity of O(n) and time complexity of O(n).

  int fibNumber = 9;
 int[] fibArray = new int[fibNumber+1];
 fibArray[0] = 0;
 fibArray[1] = 1;
 for(int i = 2; i <= fibNumber; i++){  
       fibArray[i] = fibArray[i - 1] + fibArray[i - 2];
 }
 for(int i = 0; i <= fibNumber; i++){
       System.out.print(fibArray[i]+" ");
 }

 While calculating complexity of an algorithm, you will need to make some basic mathematical calculations and will have to know about the polynomial power that can be neglected. To know more about it and learn how to use it you should go through this link
To know about the other asymptotic notations in detail, follow these links: Link 1 and Link 2

Thursday, December 13, 2018

Architectural overview of Hadoop


What is Hadoop? Apache Hadoop is a framework for distributed computation and storage of very large data sets on computer clusters. Hadoop began as a project to implement Google’s MapReduce programming model, and has become synonymous with a rich ecosystem of related technologies, not limited to: Apache Pig, Apache Hive, Apache Spark, Apache HBase, and others. Hadoop has seen widespread adoption by many companies including Facebook, Yahoo!, Adobe, Cisco, eBay, Netflix, and Datadog. Hadoop architecture overview Hadoop has three core components, plus ZooKeeper if you want to enable high availability: Hadoop Distributed File System (HDFS) MapReduce Yet Another Resource Negotiator (YARN) ZooKeeper

MapReduce and Its Applications, Challenges, and Architecture

Links:

  • https://www.academia.edu/37445330/MapReduce_and_Its_Applications_Challenges_and_Architecture_a_Comprehensive_Review_and_Directions_for_Future_Research.pdf
  • https://www.academia.edu/10175250/Iterative_MapReduce_and_its_applications

Install Lubuntu in virtual box for Apache Hadoop Installation

Links

  • https://examples.javacodegeeks.com/enterprise-java/apache-hadoop/apache-hadoop-cluster-setup-example-virtual-machines/#introduction -->This one worked fine till vm installations
  • http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/ --> This one is outdated.
  • https://www.edureka.co/blog/install-hadoop-single-node-hadoop-cluster?utm_campaign=quora-post&utm_medium=content-link&utm_source=quora --> unnecessary configurations in this.
  • https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html --> standalone and pseudo-distributed clear

Introduction to Algorithms

What is an Algorithm? In real world, there are problems for which sometimes solutions are available and sometimes you must find the so...