Introduction
Power Query lets you join tables based on one or more columns and using a variety of join types (inner, left outer, etc.). The join condition, however, is always based on value equality.
This is sufficient in most scenarios, such as a when joining tables like Customers and Countries based on a common country code column. Any other type of condition, such as “greater than”, wouldn’t make sense here.
What about the case where the join condition becomes more complex to a point that an equality rule is no longer sufficient? Power Query does not guide you towards a way to implement other types of conditions, such as “greater than” or “on or before” for date columns.
Problem example
Let’s form an example with two tables. The first one, Sales, represents sold products:
Date | Product | Quantity | Unit Price |
---|---|---|---|
1/20/2023 | Yellow sweater | 3 | $48.00 |
1/25/2023 | Yellow sweater | 1 | $48.00 |
2/07/2023 | Green hat | 1 | $10.00 |
3/20/2023 | Yellow sweater | 5 | $75.00 |
The second one, CostOfGoods, is a historical record of your supplier’s monthly price list. Note how there is no relation between the two tables, and especially the fact that dates do not match up:
Date | Product | Cost |
---|---|---|
1/1/2023 | Yellow sweater | $15.00 |
1/1/2023 | Green hat | $3.00 |
2/1/2023 | Yellow sweater | $17.00 |
2/1/2023 | Green hat | $4.00 |
3/1/2023 | Yellow sweater | $23.00 |
3/1/2023 | Green hat | $4.50 |
Our goal is to join the two tables such that each Sales row contains a new Unit Cost column, whose value will be the supplier’s cost at the time of the sale. This will enable us to easily calculate a margin later on.
Walkthrough
The logic to determine a sold product’s cost, in general terms, is simple:
- Look for rows matching the product
- Keep rows dated on or before the sale date
- Take the most recent row by date
Expressing such logic in Power Query can easily be done by writing a function in M code. Let’s write this function, and then see how we can use it to populate our Sales table!
To start, create a new Blank Query and open it in the Advanced Editor. The GetItemCostOnDate function definition is as follows:
|
|
Notice how the first input to this function is a reference to a table. The caller will be expected to pass CostOfGoods as an argument. While the function could access this table directly, this approach enables a significant performance optimization that will be explained later.
The function performs multiple steps to identify the correct row in which to find the product’s cost. This is where you can use the full range of M functions to express your logic (substring matching, …)!
Once saved, you should see Power Query offering you a form to invoke the function in an ad-hoc manner. Try it out to confirm it works!
With the function now available, we can now use it to populate a new Cost column in our Sales table. Open the Advanced Editor of the Sales table and insert these steps where adding the column is desired:
|
|
Things to note:
- The CostOfGoods table is buffered before being passed as a reference in the next step. This will create an in-memory copy of the table and re-use it across function invocations, which can bring a significant performance improvement.
- The GetItemCostOnDate function is called within an each statement, i.e. it will be executed for each row in our Sales table.
This gives us the desired table:
Date | Product | Quantity | Unit Price | Unit Cost |
---|---|---|---|---|
1/20/2023 | Yellow sweater | 3 | $48.00 | $15.00 |
1/25/2023 | Yellow sweater | 1 | $48.00 | $15.00 |
2/07/2023 | Green hat | 1 | $10.00 | $4.00 |
3/20/2023 | Yellow sweater | 5 | $75.00 | $23.00 |
Final words
Encapsulating complex join logic in a function isn’t the only way to achieve this task. When consuming the dataset in Power BI for example, an equivalent result could be achieved using DAX. In my opinion however, this approach is simpler and easier to trust.
An important factor to consider in evaluating this approach is that it doesn’t enable integrating report parameters in the join/filtering logic. This technique pre-computes values before the user can specify report filters, and as such the logic cannot depend on report filter selections.
Note that the function can also return an entire row (or even multiple rows), such that the “main” table could itself decide which columns to import, possibly how to aggregate values as well.
More reading
Table.Buffer (learn.microsoft.com)