Why big data needs models more than most

The dirty secrete of Big Data is exposed in this very good posting. Forget the Algorithms and Start Cleaning Your Data.

Failure of big data projects is not in technology but the ability to wire the data together. The lack of success is driven by:

  1. Poor data quality and or inadequate data error handing
  2. Incompatible or poorly understood semantics from different data sources
  3. Complex matching rules between data sources

The blog suggests that the big data tooling therefore needs to focus on the burden of integrating, cleaning, and transforming the data prior to analysis. Example: RapidMiner has 1,250 algorithms for this purpose. That might be good, but also very complex for the average human.

Sounds like a classic case of the need for separation of concerns, right? Untangling and designing solutions to these first order problems is data modeling. Given big data’s fluid data structures that means datapoint modelling. Solutioning with 1,250 data manipulation algorithms, Hadoop, algorithms and huge databases etc can then be based on visible logic and good design. With the alternative, jumping right into build, best of luck!

Advertisements

Capgemini’s testing offering is model driven

The Capgemini testing offering to the banking sector can be seen here http://www.uk.capgemini.com/testing-services/quality-testing-for-banking.

My interpretation is that they have packaged some existing model driven offerings into something their consultants can use on site.

Thoughts:

  • Hard coded: The domains they cover include payments and credit cards. The model driven solutions they use look like branded offerings from other vendors. Their solution does not look like it can drive model driven testing into other domains.
  • Scalable: They like model driven approaches as new knowledge and know how can be built into the models as they go. It then deploys to other clients for free.
  • Robust: By putting the capability into the tooling the senior management can be more confident that their consultants on the ground will deliver a good job.

Maybe I should start up a conversation with them. Our modelDT tool delivers these advantages in the general case. http://www.modeldrivers.us/modelDT_home

The disadvantage of the not being hard coded is the set up for your business domain. But then the scope is everything, not just what others have done already.

ModelDT: how to industrialize testing

Just posted to SlideShare the slides presented at NoMagic UML conference.

http://www.slideshare.net/greg.soulsby/model-dt-how-to-industrialize-testing

“In this presentation you will learn steps towards making your testing: – Correct – Scalable – Agile – Low cost”

New webinar – Introducing OCL: fast ways to deploy business rules

Tricia Balfe from Nomos Software has offered me an online introduction to OCL. The conversation started here getting-benefit-from-ocl-rules-with-nomos

She said she will show me how to deploy the business rules into Java. I asked her to keep it simple! She has set up a webinar and said anyone can join. Tricia is a genuine expert, so it will be good. Webinar: Introducing OCL: fast ways to deploy business rules

A great model driven success story

There are not many documented, well thought through success stories in the model driven world. Tricia Balfe at Nomos-Softyware has written one that is really well thought through. Not only that, it is not about her or Nomos, it is about a globally used solution for the finance industry. http://nomos-software.com/blog/a-model-driven-success-story