I spend all day wondering about this encoding stuff as PHP has many functions what are not UTF-8 compatible, or at least don’t return UTF-8 data. So I tried to fix all these weird chars with different encode and htmlentities approach but 3 of my servers were showing whole different hieroglyphs and special chars were still acting a bit weird.
Keeping long story short, the fix was to do some HTML entities magic before DOMDocument. Maybe with your server it’s not necessary? Who knows… Anyways, here’s my approach to get page title with php.
You can find my helper in github.
Updates:
Updates:
- Not so well formed HTML should not take over your error log anymore (libxml) ![]()
- Added multiple user agents to curl
- Added browser headers to curl
- Helper now follows redirection
- Now it removes whitespace around title
With headers and user agent in curl, now it returns title from sites like Facebook also.
