…And How to be in a Good Relationship
The following is an excerpt from our new white paper: Joins, Aggregations and Many-to-Many Relationships: A Cheat Sheet. You can download the complete guide right here.
The Many Faces of Data Relationships
While having a many-to-many relationship may sound steamy to some of you, when it comes to the data in your tables it can get complicated. That is, unless you understand the different scenarios, their resolutions, and how to build a good relationship with your data. Just as a general overview, a relationship specifies the logic used to combine data from one or more tables. You create relationships by connecting fields between two or more tables, and this determine what data is reflected in a dashboard widget (such as a pivot table). There can be 3 types of relationships:
- One-to-One relationship – In this scenario, the field used to connect both tables only has unique values in every row.
- One-to-Many relationship – In this scenario one table holds unique values for every row, but the other table holds duplicate values for any or all of the corresponding values in the first table.
- Many-to-Many relationship – In this scenario, on both sides of the table there are duplicated values, causing excessive calculations for every query run against it
Understanding a M2M Relationship and Its Consequences
Definition: You can consider the general rule that when a field from two or more tables contains the same value, and these values are duplicated in both tables, a connection has been made based on this field a many-to-many relationship is created.
We know, it’s confusing, so here’s a business case example: a hotel can have a data table with reservation data and a table with payment data. In both tables the name of the guest is stored. A guest can have multiple reservations under their name, as well as multiple payments for their stay recorded in their name. If a relationship between the reservation and payment table was created based on the guest’s name, a many-to-many relationship would be created (as the guest’s name appears multiple times in each table).
Negative Consequences: The problem with this kind of relationship is that it can create complex data sets which either: do not return the correct results, or use excessive computing resources and don’t return any results. Both scenarios lead to data havoc such as creating duplicates, incorrect results, and performance lags, which is why many people try to avoid many-to-many relationships all together.
Resolutions: There are several methods to resolve or bypass a many-to-many relationship that should be chosen based on the business model and the logic of the business questions at hand. But, the first step is to test to see if you are dealing with a many-to-many relationship (see next section), and then according to the schema logic, apply the best resolution. For a verytechnical, step-by-step explanation on how to test and resolve M2M relationships, read this post. If you want a less technical overview, just keep reading.
Determine What Kind of Relationship Your In
You can easily test if a relationship is Many-to-Many by checking the data modeling of the relationship and determining the exact number of unique and duplicate values on each side of the relationship.
- If you get the same value for both the unique and duplicate values, then there is no duplication, and this relationship is either a One-to-Many or a One-to-One. Continuing with our business case, this would mean that all Guest IDs appear only once–making all values unique. You can stop investigating at this stage given that even if the other side of the relationship has duplicate values for Guest ID, you’ll still be dealing with a One-to-Many relationship–where the unique values are the ‘Reservations’ side, and the duplicate values are on the ‘Payments’ side.
- If the number of duplicate values are higher than the number of unique values, then this side of the relationship has duplicated values and you will need to investigate the other side of the relationship:
- If the other side of the relationship yields unique values, this is a one-to-many relationship.
- If not and there are more than two tables connected to this relationship, we’ve got a many-to-many relationship on our hands.
For those of you who want to get more technical, here is a simple SQL statement you can use to check for potential M2M relationships.
Best Ways to Resolve M2M Relationships
There are many ways to resolves M2M relationships, but here are common approaches based on the number of M2M relationships that exist as well as the number of tables involved. If there are 2 tables, 1 relationship, you’re best options would be to:
- Break this relationship into two separate one-to-many relationships
- Create an aggregated table
If there are more than 2 tables, and more than 1 relationship, you’re best options would be to:
- Use the Lookup Function to copy a value from one table and import it into another based on a logical test.
- Combine the two tables into one.
The most important aspect of understanding a many-to-many relationship is to know that they need to be avoided in order to ensure you have accurate data, no duplicate values, and your performance does not lag–a negative consequence of an M2M relationship that will ruin the user experience and discourage queries. Here are 5 simple rules for building relationships when getting started with the Sisense BI software that serves as a short and sweet guide to ensure you stay in good relationships with your data.
Learn more about mashing up data sources
Read a quick guide covering:
- Joins vs. aggregations
- Avoiding many to many relationships
- Joining data sources with BI software
Download our free white paper: The Data Mash Up Cheat Sheet (direct download, PDF)