Big data and analytics – the media is on the band-wagon

There are 3 items in London’s @CityAM paper this morning on big data and analytics.

City A..M. readers are business people, mainly in London, and if like me, on the Tube. City A.M. only write what they think this audience are interesting in reading. And it seems business people like reading about big data and analytics a lot.

On page 22 is a case study on a firm, “The Outside View”, using data and analytics to find prospects. “Using data to drive new sales”. The main thread is about using a very wide range of data, not just internal – LinkedIn for example.

The other is an opinion piece “Why it’s nimble SMEs that are best-positioned to capitalise on the huge benefits of big data”. It is mainly about the lower cost of managing big data – the cloud etc.

On-line there is another “The big data toolkit” by @jacquitaylorfb

If you are pushing data and analytics to your organisation these might make good PR for you.

Why big data matters for accountants – a good read

In todays London CityAM paper, page 25, there is a good article: -> “Why big data matters for accountants”.

Some main points for me:

  • When accountants say something is a gold rush you have to believe it.
  • “The ability to link data sets is creating new insights” is in the second paragraph. So event accountants agree that data design is a first order issue.

However, there is no example use case given. Some come to mind, but I would be very interested in your ideas. What are the big data use cases for accountants? Please leave a comment.

Why big data needs models more than most

The dirty secrete of Big Data is exposed in this very good posting. Forget the Algorithms and Start Cleaning Your Data.

Failure of big data projects is not in technology but the ability to wire the data together. The lack of success is driven by:

  1. Poor data quality and or inadequate data error handing
  2. Incompatible or poorly understood semantics from different data sources
  3. Complex matching rules between data sources

The blog suggests that the big data tooling therefore needs to focus on the burden of integrating, cleaning, and transforming the data prior to analysis. Example: RapidMiner has 1,250 algorithms for this purpose. That might be good, but also very complex for the average human.

Sounds like a classic case of the need for separation of concerns, right? Untangling and designing solutions to these first order problems is data modeling. Given big data’s fluid data structures that means datapoint modelling. Solutioning with 1,250 data manipulation algorithms, Hadoop, algorithms and huge databases etc can then be based on visible logic and good design. With the alternative, jumping right into build, best of luck!