It seems that there are some control characters in that text, specifically U+200C "zero width non-joiner" which is interfering with OpenType shaping. If they are removed, then the glyphs align correctly. Do you know if they are there for a reason, or perhaps they were added accidentally by some other system?
I am not sure why those control characters have crept in. I am using WordPress to add content, by the way.
Which software did you use to observe them?
I tried using Notepad++ in Windows. I copied the problematic text from the PDF, pasted it in Notepad++ and then chose 'Show All Characters'. I saw nothing strange.
The CSS rule partly fixed the problem. I still see misalignment in the letters.
I've attached a screenshot indicating where the misalignment starts. The offending letters appear slightly lower than the rest, as you'll be able to notice.
At this point, I am unable to ascertain why the control characters are there. I'll update this thread once I have the information.
It seems that even after the U+200C control characters are removed, there is still a vertical alignment issue for the character U+09B9 (BENGALI LETTER HA) but only when followed/combined with U+09CD (BENGALI SIGN VIRAMA). For some reason this mark results in the combined glyph moving down and breaking the baseline. We will continue our investigation.
It seems that the glyphs themselves are printed at different heights, without any OpenType positioning adjustments being made at all. That is very strange, as one would assume that the glyphs would all appear at the same vertical position by default.
Yes, as soon as we figure out exactly what is going on.
It seems to be related to the interplay of two different OpenType substitution features: "half" and "haln". Both of these substitution features apply to the sequence of ha+virama (halanta/hasanta?) characters mentioned earlier. However, the "half" feature produces the expected correct glyph, and the "haln" feature produces a glyph that is pushed down slightly. I don't know why, but this appears to be an error in the font, and can be confirmed with a font editor.
Still, the browsers make it work. It seems they are applying the "half" feature and Prince is not. The OpenType recommendations for Bengali are to apply both features, with "half" first. However, it appears that we only apply it to "pre-base" consonants, and we don't consider the ha character to be a pre-base consonant in this context. This may not be correct; unfortunately we are not experts in every script that we have implemented.
Checking the error console in Firefox, I get these errors:
Sorry, pressed post a bit too quickly there. Continuing: it seems that the font has some serious problems. Firefox doesn't like it, and I can't convince Chrome to load it at all. Given that other Bengali fonts appear to be working fine, this may be more of a font problem than a Prince problem.
If you had a specific text fragment you could replace "a\A0 b" with "a b". If you want true regular expressions, then you will need to do this with JavaScript.