tesseract-005
Tests that lstmbox output works
Test is expected to pass.
The pipeline
<p:declare-step xmlns:cx="http://xmlcalabash.com/ns/extensions"
xmlns:p="http://www.w3.org/ns/xproc"
xmlns:t="http://xproc.org/ns/testsuite/3.0" name="main" version="3.0">
<p:import href="https://xmlcalabash.com/ext/library/pdf-steps.xpl"/>
<p:import href="https://xmlcalabash.com/ext/library/tesseract.xpl"/>
<p:output port="result"/>
<cx:pdf-to-images dpi="300">
<p:with-input port="source"
href="../documents/example.pdf"/>
</cx:pdf-to-images>
<cx:tesseract language="eng" output-format="lstmbox"
debug-output="/dev/null"/>
<p:wrap-sequence wrapper="text"/>
</p:declare-step>
Result
<text xmlns:t="http://xproc.org/ns/testsuite/3.0">P 200 3206 591 3270 0
D 200 3206 591 3270 0
F 200 3206 591 3270 0
200 3206 591 3270 0
T 200 3206 591 3270 0
e 200 3206 591 3270 0
x 200 3206 591 3270 0
t 200 3206 591 3270 0
200 3206 591 3270 0
T 191 3076 847 3120 0
h 191 3076 847 3120 0
i 191 3076 847 3120 0
s 191 3076 847 3120 0
191 3076 847 3120 0
i 191 3076 847 3120 0
s 191 3076 847 3120 0
191 3076 847 3120 0
a 191 3076 847 3120 0
191 3076 847 3120 0
s 191 3076 847 3120 0
a 191 3076 847 3120 0
m 191 3076 847 3120 0
p 191 3076 847 3120 0
l 191 3076 847 3120 0
e 191 3076 847 3120 0
191 3076 847 3120 0
P 191 3076 847 3120 0
D 191 3076 847 3120 0
F 191 3076 847 3120 0
191 3076 847 3120 0
d 191 3076 847 3120 0
o 191 3076 847 3120 0
c 191 3076 847 3120 0
u 191 3076 847 3120 0
m 191 3076 847 3120 0
e 191 3076 847 3120 0
n 191 3076 847 3120 0
t 191 3076 847 3120 0
. 191 3076 847 3120 0
191 3076 847 3120 0
206 2451 677 2929 0
206 2451 677 2929 0
W 190 2250 500 2295 0
i 190 2250 500 2295 0
t 190 2250 500 2295 0
h 190 2250 500 2295 0
190 2250 500 2295 0
a 190 2250 500 2295 0
n 190 2250 500 2295 0
190 2250 500 2295 0
i 190 2250 500 2295 0
m 190 2250 500 2295 0
a 190 2250 500 2295 0
g 190 2250 500 2295 0
e 190 2250 500 2295 0
. 190 2250 500 2295 0
190 2250 500 2295 0
</text>
Schematron checks
<s:schema xmlns:s="http://purl.oclc.org/dsdl/schematron"
xmlns:t="http://xproc.org/ns/testsuite/3.0" queryBinding="xslt2">
<s:pattern>
<s:rule context="/">
<s:assert test="text">Wrong document element</s:assert>
</s:rule>
</s:pattern>
<s:pattern>
<s:rule context="/text">
<s:assert test="starts-with(., 'P 200')">Wrong text</s:assert>
</s:rule>
</s:pattern>
</s:schema>
Revision history
- 12 Jun 2026, Norm Tovey-Walsh
- Created test.