Package: orderanalyzer 1.0.0

orderanalyzer: Extracting Order Position Tables from PDF-Based Order Documents

Functions for extracting text and tables from PDF-based order documents. It provides an n-gram-based approach for identifying the language of an order document. It furthermore uses R-package 'pdftools' to extract the text from an order document. In the case that the PDF document is only including an image (because it is scanned document), R package 'tesseract' is used for OCR. Furthermore, the package provides functionality for identifying and extracting order position tables in order documents based on a clustering approach.

Authors:Michael Scholz [cre, aut], Joerg Bauer [aut]

orderanalyzer_1.0.0.tar.gz
orderanalyzer_1.0.0.zip(r-4.5)orderanalyzer_1.0.0.zip(r-4.4)orderanalyzer_1.0.0.zip(r-4.3)
orderanalyzer_1.0.0.tgz(r-4.4-any)orderanalyzer_1.0.0.tgz(r-4.3-any)
orderanalyzer_1.0.0.tar.gz(r-4.5-noble)orderanalyzer_1.0.0.tar.gz(r-4.4-noble)
orderanalyzer_1.0.0.tgz(r-4.4-emscripten)orderanalyzer_1.0.0.tgz(r-4.3-emscripten)
orderanalyzer.pdf |orderanalyzer.html
orderanalyzer/json (API)

# Install 'orderanalyzer' in R:
install.packages('orderanalyzer', repos = c('https://michael-scholz-dev.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

On CRAN:

This package does not link to any Github/Gitlab/R-forge repository. No issue tracker or development information is available.

1.00 score 3 exports 39 dependencies

Last updated 16 days agofrom:1a49489785. Checks:OK: 7. Indexed: yes.

TargetResultDate
Doc / VignettesOKDec 13 2024
R-4.5-winOKDec 13 2024
R-4.5-linuxOKDec 13 2024
R-4.4-winOKDec 13 2024
R-4.4-macOKDec 13 2024
R-4.3-winOKDec 13 2024
R-4.3-macOKDec 13 2024

Exports:extractTablesextractTextidentifyLanguage

Dependencies:clicpp11data.tabledigestdplyrfansifastmatchgenericsglueISOcodesjsonlitelatticelifecyclelubridatemagrittrMatrixmatrixcalcpillarpkgconfigpurrrquantedaR6RcpprlangrlistSnowballCstopwordsstringistringrtibbletidyrtidyselecttimechangeutf8vctrswithrXMLxml2yaml