Interview with Kshitij Purwar

Talks

What happens when stack overflow doesn’t have an answer? comparing ST_within & H3 for spatial queries
Thursday, 15:40
Berlin 2+3

Social Media

Could you briefly introduce yourself?

I am Kshitij Purwar, Founder and CTO of Blue Sky Analytics, a Climate-Tech start-up, using satellite data, cloud computing, open source technology and AI to build environmental monitoring and climate-risk assessment products. We have complete remote team distributed between India, Netherland & US!

I am a college dropout & self-taught developer with over nine years of software development experience. I began working professionally in first semester of my college & joined an early state tech startup as a core engineer after dropping out in the 3rd semester of my college.

In 2018, along with my elder sister Abhilasha, I founded Blue Sky Analytics to help fight climate change with data. At Blue Sky, I lead a team of young developers and data scientists to analyze terabytes of satellite data to deliver sophisticated environmental datasets; and build “SpaceTime”, a data visualisation platform to support open source & collaboration for Climate Action.

How do you engage with the PostgreSQL Community?

I am fairly new to the PostgreSQL community, been working on it only for last couple of years, primarily as a consumer. Mostly engaging on Twitter & Stack Overflow as a silent observer.

Learnt a lot from good folks at TimescaleDB on their slack community as we were one of the earliest people to put it in production, especially in a combination with PostGIS to run spatio-temporal queries.

Looking forward to making more contributions to the community by publishing a few blogs/tutorials in next coming months.

Have you enjoyed previous PostgreSQL Europe conferences, either as an attendee or as a speaker? (PGConf.EU, FOSDEM PGDay, Nordic PGDay, pgDay Paris, PGConf.DE)

No, this would be my first PostgreSQL Europe conference as I moved to Netherlands only a few months ago.

What will your talk be about, exactly? Why this topic?

My talk is about how we solved our issues of spatial joins across a large amount of geospatial data using H3 indexing.

We deal with large geospatial datasets and run many spatial-temporal queries over them; the spatial join (point in polygon query) was one of the bottlenecks we encountered in many use cases. Some postgis functions like ST_Within solve this problem for us but it’s very slow when there are millions of points & the shapes are complex. My talk is how to optimise these specific kinds of query using H3.

What is the audience for your talk?

Anybody handling large amounts of geospatial data with the interest in analyzing and optimising point in polygon queries.

What existing knowledge should the attendee have?

Basic knowledge of PostgreSQL, PostGIS i.e. spatial joins & H3 is enough to understand the talk. I’d giving 2 min crash course for each anyway!

What is the one feature in PostgreSQL 15 which you like most?

Truth be told we are yet to upgrade to version 15 but my team & I have been really excited about the latest release.

I really like the server side compression & client decompression feature in pg_basebackup that moves the native backup and restore functionality of postgresql 15 to a more efficient and robust direction. For our larger databases, server side compression makes a lot of difference.

But the in-memory and on-disk sorting performance improvements are amazing as well since they affect a lot of our queries and give us out of the box “just works” improvements.

Which other talk at this year’s conference would you like to see?

Lots of interesting topics but here are my top 4 (I hope they timings don’t clash 🤞)

PostgreSQL at GitLab.com, always wondered how things worked in Big Companies
Hands-on Benchmarking, you can’t solve what you can’t measure
A comparison of PostgreSQL backup tools, I am glad someone tried them all & compared so other don’t have to
Performance tips you have never seen before, You can never be fast enough!

Which measure, action, feature or activity would—in your eyes—help to accelerate the adoption of PostgreSQL?

Making it more beginner friendly & less scary. I have seen people jump on NoSQL for SQL suited use cases because of the ease & simplicity around NoSQL DBs like MongoDB.

I wish PSQL documentation had a “Explain me like I am 5” version!