Data should be as it is: Data type: Data has some features which are as follows - 1) It can be very large - and therefore the algorithm for these analyzes has to be scalable. Otherwise it will be seen that your algorithm can work very fast on less data but when it comes to calculating with more data it hangs. (Complexity of algorithm will be considered later) 2) High Dimensionality - Data can be thousands of dimensions completed. Hmmm, thousands. 3) Very complex in nature - such as sensor data , various data streams (sound) , time series data , temporal data , sequence data, etc. Multimedia data , text or web data. Graph data or social network data etc. etc.