What amazes me about working with large amounts of data, apart from how much of it there is, is the pile of waste that it produces. If I work with one terabyte of data, I would expect that less than one gigabyte is actually usable. This is less than one per mile of 999 miles of waste.
On the other side of the coin, my colleagues who are responsible for data analysis need almost a whole day just to clean the data. You may have invested years in training them, to deploy intelligent algorithms determine quickly, but it turns out their main job is to take out the rubbish. Algorithms are very sensitive beings. You may feed them only with selected materials, or else they choke or become clogged.
Waste separation in Big Data is a manual task. But how far can this go? More and more data sources are tapped, but opening it up is just like working in a mine. Firstly, you have to get at the precious ore which you can use for the algorithm.
If nobody invents an efficient process for the waste separation, then Big Data will soon come to its natural end. Universities cannot train all the waste collectors that are needed for manual separation.