Dewarping dataset a) Grayscale camera-captured document image portion from CDBAR 2007 dewarping contest dataset. Paper Fourier Document Restoration for Robust Document Dewarping and Recognition (CVPR 2022) Date Mar, 2022 Jan 1, 2021 · The performance of the proposed method, along with performances of the other existing methods in terms of the ratio C 1 C 2 on ‘DFKI dewarping context dataset’, is presented using the box plot which is illustrated in Fig 12. In this section, we introduce our DocW dataset, a new challenging dataset specif-ically designed for the dewarping of multi-scale document images. Our key insight is that a well-dewarped document is characterized by transforming distorted feature lines into axis‐aligned ones Jan 1, 2007 · The benchmark dataset for alphabetic script named 'DFKI dewarping contest dataset' [43], having 102 images, is used to test the performance of the proposed method along with other state-ofthe-art To demonstrate the performance of our algorithm on real world documents, we evaluate it on the dataset of CBDAR document image dewarping contest [4]. Jul 11, 2025 · Due to the lack of annotated line features in current public dewarping datasets, we also propose an automatic fine-grained annotation method using public document texture images and an automatic rendering engine to build a new large-scale distortion training dataset. Aug 6, 2025 · For Dewarping, the Doc3D dataset (das2019dewarpnet), which contains 100,000 synthetic images, and UVDoc dataset (verhoeven2023uvdoc), which contains 20K synthetic images, are used for training. Due to the lack of annotated line features in current public dewarping datasets, we also propose an automatic fine-grained annotation method using public document texture images and an automatic rendering engine to build a new large-scale distortion training dataset. If you use the dataset or this code, please consider citing our work- These codes are heavily structured on pytorch-semseg. It consists of three modules, document image augmentation and feature extraction, sequence modeling and contextual Jul 20, 2024 · Without bells and whistles, we re-train two popular document dewarping models on our registered document dataset WarpDoc-R, and obtain superior performance with those using almost 100× scale of synthetic training data, verifying the label quality of our document registration method. Jul 20, 2025 · Abstract Document dewarping is crucial for many applications. From the review of dewarping techniques toward warped camera-captured document images, it is noticed that, Jan 6, 2025 · Using classical CV methods makes the document topology restoration process more efficient and faster, as it requires significantly fewer computational resources and memory. We present a novel high-resolution dataset with template information, 3D renderings, a multiplicity of supervision signal maps, and backward transforms to enable designated learning of structural features for image dewarping. Sep 11, 2024 · Training datasets are one of the key factors in building document dewarping methods, and a lot of work has been devoted to contributing synthetic or real document datasets. A comprehensive list of awesome document image rectification methods based on deep learning. The code and pixel-level labels will be released. We also propose an algorithm that handles the warp complexities equally effectively for this dataset and the earlier datasets. ) with different illuminations. Furthermore, to train the scale-aware stage network and validate the effectiveness of ScaleDoc, a new document dataset, DocW, has been cre- ated. Theexperimentisconductedby training the proposed network using synthetic UW3 dataset prepared and the evaluation is done on the UW3 dataset and additionally, another originally dis- torted dataset. I'm looking for any public available dataset that contains curved text lines (preferably one per image), like those from "Alignment of Curved Text Strings for Enhanced OCR Readability". The doc3D dataset can be downloaded using the scripts here. Contribute to NachappaCH/Dewarping-Dataset-Annotations development by creating an account on GitHub. In this paper, we propose a simple but effective method, named DocHFormer, that can take hierarchical priors features of images, including May 7, 2024 · DocRes achieves new records in certain metrics for benchmark datasets related to dewarping, deshadowing, deblurring, and appearance enhancement tasks. We further create a comprehensive benchmark that covers various real-world conditions. We propose a novel image dewarping algorithm, which improves the state-of-the-art by a considerable margin through leveraging additional Jun 24, 2025 · In this section, we introduce our DocW dataset, a new challenging dataset specifically designed for the dewarping of multi-scale document images. Total-Text-Dataset In CBDAR 2007, we introduced the first dataset (DFKI-I) of camera-captured document images in conjunction with a page dewarping contest. When there are few aligned text-lines in the image, this usually means that photos, graphics and/or tables take large portion of the input instead. Therefore, we have created a dataset that involves complex warp distortions. DFKI Dewarping Contest Dataset (CBDAR 2007) The dataset, that was used in the CBDAR 2007 Dewarping Contest, contains 102 camera captured documents with their corresponding ASCII text ground-truth. The generated input dataset looks similar to the originally distortedtestimages. 148 147 We summarize mainstream document dewarping datasets in Fig-149 ure 2 (a). The dewarping stage introduces a lightweight method that dewarps warped documents by predicting document edges using sparse control points. The dataset consists of 102 documents, captured with hand-held camera. Training with Doc3D, we demonstrate state-of-the-art performance for DewarpNet with extensive qualitative and quantitative evaluations. We begin with a review of existing datasets before detailing the newly introduced dataset. This paper also provides a training dataset based on control points for document dewarping. We developed a new pipeline for automatic document dewarping and reconstruction, along with a framework and annotated dataset to demonstrate its efficiency. scenetext - This is a synthetically generated dataset, in which word instances are placed in natural scene images, while taking into account the scene layout. With the development of a series of advanced models, the performance on various benchmark datasets has seen considerable improvement, as evidenced by the increasingly better quantitative outcomes. One of the main limitations of this dataset is that it contains images only from technical books with simple layouts and moderate curl/skew. Apr 29, 2023 · Our contributions are threefold: We present a novel high-resolution dataset with template information, 3D renderings, a multiplicity of supervision signal maps, and backward transforms to enable designated learning of structural features for image dewarping. Our method leverages deep learning (DL) for document contour detection and employs cubic polynomial interpolation to create a topological 2D grid, which corrects nonlinear distortions through image remapping. DocUNet dataset A synthetic dataset for document image dewarping, generated by perturbing a mesh and applying a dense warping map. art performance on real-world dataset. From the review of dewarping techniques toward warped camera-captured document images, it is noticed that, The network is trained on this dataset with various data augmentations to improve its generalization ability. We can see that most of datasets are synthetic, containing 150 rich labels at pixel level. Mar 20, 2022 · Experiments show that our approach can rectify document images with various distortion types, and yield state-of-the-art performance on real-world dataset. These models can be used to unwarp DocUNet images and reproduce the results in the ICCV paper. Dec 5, 2024 · Document Image Dewarping (DID) task aims to address the issue of geometry distortion and improve image quality. Both the code and the dataset are released at https://github. Hence, for the robust document dewarping, we propose to use line segments in the image in addition to the aligned text Jun 8, 2022 · The images were taken in different scenes (indoors, outdoors, etc. However, several recent studies have unveiled that the commonly used evaluation metrics may not consistently Contribute to NachappaCH/Dewarping-Dataset-Annotations development by creating an account on GitHub. Aug 7, 2025 · For Dewarping, the Doc3D dataset [5], which contains 100,000 syn-thetic images, and UVDoc dataset [56], which contains 20K syn-thetic images, are used for training. Conventional text-line based document dewarping methods have problems when handling complex layout and/or very few text-lines. b) Poor binarization (adaptive thresholding) result; which has lost most of its details. Dec 24, 2024 · About The new pipeline for the automatic dewarping and reconstruction of camera-captured documents, along with an associated framework and dataset. May 20, 2021 · The benchmark dataset for alphabetic script named ‘DFKI dewarping contest dataset’ [43], having 102 images, is used to test the performance of the proposed method along with other state-of-the-art methods upon alphabetic scripts. The control points and reference points are composed of the same number of vertices and describe the shape of the document in the image before and after rectifying, respectively. com/gwxie/ ing Document · Learn-ing. 25,000 images with full 2D and 3D annotations created using 100 HTML templates for a wide layout variety fully randomized realistic content Contribute to xiaomore/Document-Image-Dewarping development by creating an account on GitHub. The code and dataset will be publicly released. May 28, 2025 · To offer a comprehensive evaluation of document dewarping models, we construct a fine-grained benchmark dataset AnyPhotoDoc 6300, which contains 6,300 real-world photographic image document pairs, rigorously organized across three distinct domains. This dataset features multiple ground-truth annotations, including 3D shape, surface normals, UV map, albedo image, etc. Updated on June 8, 2022: Added the digital document images with margin for the evaluation of image quality. Jan 9, 2023 · Therefore, we have created a dataset that involves complex warp distortions. . However, existing learning–based methods primarily rely on supervised regression with annotated data without leveraging the inherent geometric properties in physical documents to the dewarping process. Jan 3, 2025 · This paper presents a novel and robust document image dewarping method, namely DocMamba, based on the idea of selective state space sequence modeling. A simple yet effective approach to rectify distorted document image by estimating control points and reference points. Sep 1, 2025 · Recently, the task of document image dewarping has garnered significant attention. Even for binarization tasks, where the dedicated SOTA model GDB still holds the top position, DocRes exhibits performance closely trailing behind it. zi gx j5upz9 jkz lkus7nb wpdksd wxh5 pjo mrh btw1g