Back to blog

Data Modelling on Qlik- Part 2

December 26, 2016 - Posted in Analytics , Qlik View Posted by:

Tags: , , ,

Last time I mentioned in my blog “The only harm perhaps that synthetic keys cause is that they destroy the readability of your Data Model”. Some of the beginners did not quite understand this well enough and I realized this may need some elaboration.

My next exercise will just highlight the same.

In this exercise we will build on to the Data Model we had arrived on in Exercise 2, which is reproduced again below for the sake of continuity:image-1-qlik-blog-2

Exercise 3:

To the Data Model created in Exercise 2, add the following four tables. The tables are self-explanatory. You may note the following regarding the tables:

  • Invoices table links to the Sales table.
  • Similar to the Customer, SalesOrders and Invoices tables, for the incoming supply of goods the Suppliers, Purchases and Receipts tables have been added.

Just bring these tables to the Data model like you did in Exercise 2, and observe what happens to your data model.

  1. Invoices capture1
  2. Suppliers                                      capture2
  3. Purchases               capture3
  4. Receipts           capture4

Solution to Exercise 3:

You may observe that you will get a Data Model which is something like this:image-2-qlik-blog-2

The examples we are discussing so far are trivial, and even in such a trivial case also you may see it becomes pretty unintuitive to understand this Data Model containing Synthetic Keys. In a Production Scenario where many complex scenarios are expected, the data model may become a pain to understand for any new comer on the project.

Getting back to where we left off last time, let me now introduce the concept of link table. I signed off the blog asking you to try creating a Link Table instead of the Synthetic Key Table introduced by the system in Exercise 2. If you have not tried it, and you are a beginner, I recommend you try eliminating the Synthetic Key yourself first, before going through the following solution.

The solution will go something like this:image-3-qlik-blog-2

If you look at the Data Model we haven’t done much except bring in a Link Table with meaningful field names instead of the system generated Synthetic Key table. The table “PartDateLink” introduced by us here contains all the combinations of PartNo and Date fields available in the entire Data Model.

The last point is important and is explained further. In our model PartNo and Date field is available in two tables viz. Sales and Inventory. The Link Table is a UNION of all PartNo~Date combinations found in both the tables. A PartNo~Date combination existing in both the Sales and the Inventory tables should appear only once in the Link table. For the Data Model that we just created with all tables populated as per the exercises, a preview of the “PartDataLink” looks like follows:image-4-qlik-blog-2

I hope this blog has helped you in understanding the concept of Link Table. In a simpler data model, like what we just handled a Link table may not seem to be giving much advantage over the Synthetic Key automatically created by the system. However the next exercise will help you to appreciate Link Tables better.

Exercise 4:

To the Data Model created in Exercise 3, try eliminating the Link table.

While I find time to write the next piece in this series you may on self-effort try to solve the above exercise.

Read Part 1 : Here

Avatar

Author: Yogesh Pathak

Co-Founder and Head – BI, Big Data & Analytical Services at Technoforte Software Pvt. Ltd.

3 Comments

David Dumas 3 years ago

Please correct me if I am reading this wrong, but I believe you last data model screenshot suggests the end goal is to be able to correlate sales and inventory levels over time by customer and part. I realize that customer is not a conformed dimension across inventory. Metrics such as “Months of Inventory”, and “Inventory turn rate” come to mind. That said, I believe an alternative model may be to concatenate sales and inventory together into a central fact. Part and time then become conformed dimensions over this central fact, and customer analysis is limited to sales metrics. I may not be understanding something in your end goals, but that would be the model I would use with the understanding that I have.
Dave

Reply

Yogesh Pathak 3 years ago

You are right Dave. That’s how I would also design this using a Star Schema.

If you go through Part 1 of my blog, I clarify that “The initial exercises shall be trivial since the blog is being written for the absolute newbie but as we proceed I shall bring in more and more complexities”

In this context, current model is defined the way it is for the beginner with the aim of helping him/her with the concept of Link Tables.

Start schema is not introduced yet in this series…

Regards
Yogesh

Reply