Extracting More Data From The Overpass API (Part 3) Extracting More Data From The Overpass API (Part 3)

December 24, 2021

laravel openstreetmap

I continued developing the parseXMl.php file to get 2 more sets of data. An address, which is a combination of $housenumber . " " . $street . " " . $city . " " . $postcode; and also a type, which is a combination of sport, leisure, shop, amenity, office which gives an idea of what the object is.

I have added fields into the database table to store these like so:

CREATE TABLE `points` (
  `id` varchar(11) NOT NULL,
  `website` varchar(255) NOT NULL,
  `address` varchar(255) NOT NULL,
  `type` varchar(255) NOT NULL,
  `twitter` varchar(255) NOT NULL,
  `name` varchar(255) NOT NULL,
  `point` point NOT NULL,

  PRIMARY KEY (`id`),
  SPATIAL KEY `point` (`point`)
) ENGINE=InnoDB;

so now the parseXml.php file looks like this

parseXml.php

<?php

$xmlDataString = file_get_contents('osmData.json');
$xmlObject = simplexml_load_string($xmlDataString);

$json = json_encode($xmlObject);
$phpDataArray = json_decode($json, false ); 

foreach($phpDataArray as $value){
    foreach($value as $row){
        $name = "";
        $twitter = "";
        $housenumber = "";
        $street = "";
        $city = "";
        $postcode = "";
        $type = "";

        foreach($row->{'tag'} as $attributes){
            if($attributes->{'@attributes'}->{'k'} == 'name'){
                $name = ($attributes->{'@attributes'}->{'v'});
            }

            if($attributes->{'@attributes'}->{'k'} == 'contact:twitter'){
                $twitter = ($attributes->{'@attributes'}->{'v'});
            }

            if($attributes->{'@attributes'}->{'k'} == 'addr:housenumber'){
                $housenumber = ($attributes->{'@attributes'}->{'v'});
            }
            if($attributes->{'@attributes'}->{'k'} == 'addr:street'){
                $street = ($attributes->{'@attributes'}->{'v'});
            }
            if($attributes->{'@attributes'}->{'k'} == 'addr:city'){
                $city = ($attributes->{'@attributes'}->{'v'});
            }
            if($attributes->{'@attributes'}->{'k'} == 'addr:postcode'){
                $postcode = ($attributes->{'@attributes'}->{'v'});
            }

            if($attributes->{'@attributes'}->{'k'} == 'sport'){
                $type = ($attributes->{'@attributes'}->{'v'});
            }
            if($attributes->{'@attributes'}->{'k'} == 'leisure'){
                $type = ($attributes->{'@attributes'}->{'v'});
            }
            if($attributes->{'@attributes'}->{'k'} == 'shop'){
                $type = ($attributes->{'@attributes'}->{'v'});
            }
            if($attributes->{'@attributes'}->{'k'} == 'amenity'){
                $type = ($attributes->{'@attributes'}->{'v'});
            }
            if($attributes->{'@attributes'}->{'k'} == 'office'){
                $type = ($attributes->{'@attributes'}->{'v'});
            }

            if($attributes->{'@attributes'}->{'k'} == 'website'){

                $address = $housenumber . " " . $street . " " . $city . " " . $postcode;            
                $address = trim($address);

                // make the sql using INSERT IGNORE to prevent errors
                echo "INSERT IGNORE INTO `points` (`id`, `website`, `address`, `type` , `twitter`, `name`, `point`) ";
                echo "VALUES(" . $row->{'@attributes'}->id .", '" . addslashes($attributes->{'@attributes'}->{'v'}) . "', '". addslashes($address) . "', '". addslashes($type) . "', '". $twitter . "', '" . addslashes($name) . "', POINT(" . $row->{'@attributes'}->lat . " , " . $row->{'@attributes'}->lon ." ));"; 

                echo PHP_EOL;
            }
        }
    }
}

running it using the same 2 commands, the first processes the xml file and outputs sql inserts, the second inserts it into the db.

php parseXml.php > points1.sql
mysql -u root -p dbtable < points1.sql

There are a few other fields I was looking at, including facebook, contact:facebook, contact:instagram, contact:linkedin, linkedin, addr:country, email, phone, contact:phone but these dont actually hold much data. So I am going to focus on displaying the results on the location pages to list the nearby entries.

NB: The type field in the code above only takes the value of the last field that exists so office is more important than amenity etc... I did notice that some have multiple values separated by semi-colons (eg. spinning;fitness;yoga) I will check to see if these work in the overpass api results.

I am also planning to run a web spider/crawler on these results to see which pages are working (non 404 results) and also extract more of the external social media links using my Social Media Link Extractor package but that is for a future post. I would be interested in adding this data to the overpass api automatically, but not sure that is allowed.


If you would like to contact me with this form on londinium.com, ilminster.net or via Twitter @andylondon