Package: orderanalyzer 1.0.0
orderanalyzer: Extracting Order Position Tables from PDF-Based Order Documents
Functions for extracting text and tables from PDF-based order documents. It provides an n-gram-based approach for identifying the language of an order document. It furthermore uses R-package 'pdftools' to extract the text from an order document. In the case that the PDF document is only including an image (because it is scanned document), R package 'tesseract' is used for OCR. Furthermore, the package provides functionality for identifying and extracting order position tables in order documents based on a clustering approach.
Authors:
orderanalyzer_1.0.0.tar.gz
orderanalyzer_1.0.0.zip(r-4.5)orderanalyzer_1.0.0.zip(r-4.4)orderanalyzer_1.0.0.zip(r-4.3)
orderanalyzer_1.0.0.tgz(r-4.4-any)orderanalyzer_1.0.0.tgz(r-4.3-any)
orderanalyzer_1.0.0.tar.gz(r-4.5-noble)orderanalyzer_1.0.0.tar.gz(r-4.4-noble)
orderanalyzer_1.0.0.tgz(r-4.4-emscripten)orderanalyzer_1.0.0.tgz(r-4.3-emscripten)
orderanalyzer.pdf |orderanalyzer.html✨
orderanalyzer/json (API)
# Install 'orderanalyzer' in R: |
install.packages('orderanalyzer', repos = c('https://michael-scholz-dev.r-universe.dev', 'https://cloud.r-project.org')) |
This package does not link to any Github/Gitlab/R-forge repository. No issue tracker or development information is available.
Last updated 16 days agofrom:1a49489785. Checks:OK: 7. Indexed: yes.
Target | Result | Date |
---|---|---|
Doc / Vignettes | OK | Dec 13 2024 |
R-4.5-win | OK | Dec 13 2024 |
R-4.5-linux | OK | Dec 13 2024 |
R-4.4-win | OK | Dec 13 2024 |
R-4.4-mac | OK | Dec 13 2024 |
R-4.3-win | OK | Dec 13 2024 |
R-4.3-mac | OK | Dec 13 2024 |
Exports:extractTablesextractTextidentifyLanguage
Dependencies:clicpp11data.tabledigestdplyrfansifastmatchgenericsglueISOcodesjsonlitelatticelifecyclelubridatemagrittrMatrixmatrixcalcpillarpkgconfigpurrrquantedaR6RcpprlangrlistSnowballCstopwordsstringistringrtibbletidyrtidyselecttimechangeutf8vctrswithrXMLxml2yaml
Readme and manuals
Help Manual
Help page | Topics |
---|---|
Extracting order position tables from PDF-based order documents | orderanalyzer-package orderanalyzer |
Extract tables from a given words-dataframe | extractTables |
Extracts the text from a PDF file | extractText |
Identifies the language of a given text based on frequent trigrams | identifyLanguage |