12-01, 10:00–10:30 (UTC), Talk Track I
Inequality joins are less frequent than equality joins, but are useful in temporal analytics and even in some conventional applications. Pyjanitor fills this gap in Pandas with an efficient implementation
Imagine a manufacturer wishing to minimise the cost of storage while maximising profits (increasing the inventory of the more profitable product, while decreasing the storage for the less profitable product), or a tax audit to find out which employers earn more, but pay less tax. It could be as simple as efficiently finding the range of dates that dates from another dataframe fit into. These are problems that can be solved by inequality joins. At the moment, the way to solve this in Pandas is with a cartesian join, which can be expensive memory wise, and generally is inefficient. This talk aims to show a better, more efficient way of solving inequality joins within Pandas.
The talk will contain a description of the algorithms implemented, as well as some speed tests with regards to performance.
No previous knowledge expected
Data Engineer, love open source, contributor to pydatatable and pyjanitor.