PHP Functions for Domain Names and IP Addresses PHP Functions for Domain Names and IP Addresses

December 17, 2022

I am working on a project that involves lots of domain names and splitting the actual domain name and the top level domain from the url.

The PHP function parse_url is a good starting point for extracting domain names but I was looking for getting the top level domain too, which is trickier.

If you split the domain by the dots from the right hand side, '.com', '.org', '.net' are all single levels. '.co.uk', '.me.uk', 'gov.uk are 2 levels from the right. but there are also domains like '.homeoffice.gov.uk' and the longest I found was 'nodes.k8s.nl-ams.scw.cloud'. So I need to find another way to extract this information.

I discovered the repo github.com/jeremykendall/php-domain-parser on github which seems to be the best way in php to extract this from a full domain name. It uses https://github.com/publicsuffix/list this project as a source of truth for the domain registrar and works great.

One little gotcha I encountered, was it needed the php-intl extension which I didnt have, so it installed a very old version and didnt work. Once i installed it, I had to force composer to upgrade the package by changing the version in the composer.json from version 1.4 to

"jeremykendall/php-domain-parser": "^6.1",

Here is an extract of the code I used to get the suffix and the domain from a url:

    use Pdp\Rules;
    use Pdp\Domain;

    $publicSuffixList = Rules::fromPath('public_suffix_list.dat');

    $url = 'www.pref.okinawa.jp'; // <- any url with the http part removed

    if (inet_pton($url)==false and strpos($url, ".")) { // check for ip addresses and contains a dot
        $domain = Domain::fromIDNA2008($url);
        $result = $publicSuffixList->resolve($domain);
        echo $result->registrableDomain()->toString(); //display 'pref.okinawa.jp';
        echo $result->suffix()->toString(); //display 'jp';
        }

The php function inet_pton($u) checks if the variable $url is an ip address, which is a 'feature' of the database I am checking. So removes the check on ip address like '1.2.3.4' as that errors. I am also checking that the domain has at least one '.' in it.

I wrote another post recently on looking up the DNS status of a domain which is a useful and quick initial test of a domain. This is the function it uses dns-get-record.php

Another cool domain related code I stumbled upon is extracting the domain in a mysql select query like so:

SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(website, '/', 3), '://', -1), '/', 1), '?', 1) AS domain FROM domains; 

So now I can use this to create a master list of domains for my project and remove those from domains I do not want.


If you would like to contact me with this form on londinium.com, ilminster.net or via Twitter @andylondon