Paco Nathan
Managing Partner at Derwen, Inc. Known as a "player/coach", with core expertise in graph technologies, natural language, data science, cloud computing; ~40 years tech industry experience, ranging from Bell Labs to early-stage start-ups. Board member for Recognai; Advisor for Amplify Partners, Data Spartan, KUNGFU.AI. Lead committer on PyTextRank, kglab. Formerly: Director, Community Evangelism for Apache Spark at Databricks.
Sessions
Data science practitioners have a saying that a 80% of their time gets spent on data prep. Often this involves tools such as Pandas and Jupyter. Graph Data Science is similar, except the data prep techniques are highly specialized and computationally expensive. Moreover, data prep for graphs is required before commercial tools such as graph databases or visualization can be used effectively. This talk shows examples of data prep for graphs. A progressive example illustrates the challenges plus techniques that leverage open source integrations with the PyData stack: Arrow/Parquet, PSL, Ray, Keyvi, Datasketch, etc.