What is the effect of ligatures in web, regarding SEO

All designers love ligatures but vanilla web typography just sucks. I heard of smartypants that solves many of thoses issues, bringing sexy quotes, gorgeous ampersands and all. It’s just perfect, visually.

But i’m worried about SEO. Let’s say on a web page, the word finally becomes finally. Are search engines capable of indexing that word and return the page when searching for finally (without the ligature) ?

Answer

Summary

If your server dishes out pages with ligatures (like smartypants does), search engines are inconsistent. Bing currently doesn’t index the ligatures right. I’d say in general, it’s asking for trouble. Since search engines change, there’s a method below you can use to test how search engines you’re interested in index ligatures.

If your server dishes out pages with regular text, then Javascript turns them into ligatures, that’s fine. Search engines don’t apply Javascript content changes before indexing pages (although it’s claimed that there are a few exceptions like loading Facebook comments). Since there’s an industry standard method for giving search engines dynamic content and this method is endorsed by Google, it’d be a big surprise if this changed in future. Google advise browsing your site using a plain-text browser (e.g. they suggest Lynx) to see your content how a search engine sees it.


More detail on the first case (pages dished out with ligatures in the code)…


In theory

…it shouldn’t make any difference to a well-set-up search engine.

First it helps to understand the difference between glyphs and characters. A ligature is one glyph that stands for two characters f i. How software treats it is up to that software and depends on context and the task at hand – you’ll see from examples in that linked question that when you copy and paste glyphs, what gets pasted will vary: sometimes the glyph is pasted, sometimes the glyph is treated as its associated characters and f and i are pasted.

Any well-made automatic text processor that is interested in text semantics (search engines, spell check, screen readers…) should treat a glyph as identical to the characters it stands for, and should treat finally as identical to finally, because that’s the textual meaning of the glyph.

In practice

Not everything is well-made…

Here’s an easy way to test search engines. Here’s a line of text from that other question:

Copy the ligature fi from Illustrator to this input box

If we take the non-ligatures version of that sentence and search on it in double-quotes:

(searching on "Copy the ligature fi from Illustrator to this input box"):

  1. …if a search engine treats ligature glyphs as matches for the characters they represent, it will find that page (and, when it’s indexed, this one)
  2. …if a search engine treats ligature glyphs as different to the characters they represent, it’ll find nothing until this page is indexed, then, it’ll find only this page, and searches with the ligature version will find that page.
  3. …if a search engine freaks out at the sight of glyphs like ligatures completely, it’ll find nothing, not even this page, and searches with the ligature version will also find nothing

Some quick results for the world’s top 5 search engines (links are to search results):

  • Google: Good (type 1). (despite comment below, it copes fine with both unicode or HTML entity formatting)
  • Bing: Fail (type 2).
  • Yahoo: Fail (type 2) (turns out Yahoo is “Powered by Bing”)
  • Yandex (Russian): Good (type 1)
  • Baidu (Chinese): erm, no graphicdesign.stackexchange.com pages seem to appear in Baidu searches at all… maybe we’re banned there…?!

Attribution
Source : Link , Question Author : TKrugg , Answer Author : Community

Leave a Comment