🔥 Burn Fat Fast. Discover How! 💪

I don't get the many shady location data providers if there is | Data Scientology

I don't get the many shady location data providers if there is Google Popular Times and Open Street Map that you can access with ease and drive similar conclusions.

location data providers are often in the press with negative headlines. Those services aggregate movement data from apps and aggregate the data to derive movement patterns which might be helpful for marketers. In fact, I had two moments in my life where I evaluated a PoC with those location data brokers.

1. They were all shady about where the data comes from which is important to understand the Bias of the data. I never got a good answer.
2. The data often just represented < 0.4% of the population (at least in Europe - different game in the USA). For a big city they might have 20K unique users while in the city were more than 3M users living.
3. They dismiss any professional data analytics principle. The data comes in CSV (if a lot of data they give you like 10 separate files). Data was not always plausible in itself

Those experiences brought me to build certain parts of those data brokers but only with open-source data:

1. If it is about location data you should know OpenStreetMap. It's the biggest Database with meta info on location. It's not perfect but big companies like Mapbox, Apple, and Microsoft rely on it. Since the API is kind of messy, you can load with this repository whole cities information smoothly into a PostGres --> https://github.com/kuwala-io/kuwala/blob/master/kuwala/pipelines/osm-poi/README.md

2. Googe Popular Times: Movement data can be also found on Google. When you search a location it is often shown how frequently a place was visited (on an index of 0-100). With this libary you can access all the Popular Times data for location and entire cities --> https://github.com/kuwala-io/kuwala/blob/master/kuwala/pipelines/google-poi/README.md


3. Global Admin Boundaries: A huge problem that often people feel when working with location data is aggregating the data into different geo-based slices (country level, admin level, or even smaller into sub-districts). Here is a repo that cleaned the data out of Open Street Map for geo boundaries worldwide from very broad to a very small granularity --> https://github.com/kuwala-io/kuwala/blob/master/kuwala/pipelines/admin-boundaries/README.md

I think with those Open Source Tools and some data science magic you can generate similar outcomes as those location data providers but totally anonymized and free. Would be awesome if anybody is interested in building a case around it :-)

/r/datasets
https://redd.it/v1192a