A good knowledge of data structures always helps in designing good systems, whether we are working with relational databases or NOSQL databases. However, this knowledge is much more important when working with NOSQL systems. One important fact to remember is that while a lot of optimization for SQL based solutions is during query time, NOSQL solutions (especially Hadoop/HBase) are design time optimized. You design an optimized schema and the queries are almost straight forward.
In data analytics, incremental processing for the aggregation is very important. When we want to serve real time data, we can not run over the old data and newly added data to calculate the overall aggregate. This makes incremental processing the first priority for real time data analytics. This in turn requires processing over the structured dataset. In this article I will try to compare some of the MPP(Massively Parallel Processing) architectures in this light.