Registro completo de metadatos
| Campo DC | Valor | Lengua/Idioma |
|---|---|---|
| dc.rights.license | Reconocimiento-CompartirIgual 4.0 Internacional. (CC BY-SA) | - |
| dc.contributor.advisor | Randall, Gregory | es |
| dc.contributor.advisor | Morel, Jean-Michel | es |
| dc.contributor.advisor | Mowlavi, Seginus | es |
| dc.contributor.advisor | Facciolo, Gabriele | es |
| dc.contributor.author | Belzarena, Diego | es |
| dc.date.accessioned | 2026-02-23T17:57:41Z | - |
| dc.date.available | 2026-02-23T17:57:41Z | - |
| dc.date.issued | 2025-09-30 | - |
| dc.identifier.uri | https://hdl.handle.net/20.500.12381/5447 | - |
| dc.description.abstract | Automatic Printed Text Recognition (APTR) is widely, but wrongly, considered a well-established digitization technology. This can be summarized in three figures: of the 129 million distinct printed books in libraries, 12 million have been scanned and only 5 million have been digitized, that is, translated into basic text. Scaling up digitization often requires automatic systems with error rates below 0.1%. Alternatively, any APTR algorithm used should be able to reliably estimate its error probability, to allow for down-stream corrections. Current printed text recognition systems do not take into account the redundancy of character forms within a single document. The goal of this internship was to take advantage of said redundancy in order to develop document-specific font models, which could eventually be combined with stochastic language models, and thus unlock scalability without compromising reliability. Even more, seeing the capabilities of the algorithm we developed to extract document-specific character prototypes, we proposed to use them to serve an alternative application: printer identification of 17th century Spanish theater plays. Doing so, we developed a method which showcased potential to enable digital bibliography at a larger scale than possible up to now. | es |
| dc.description.sponsorship | Agencia Nacional de Investigación e Innovación | es |
| dc.language.iso | eng | es |
| dc.publisher | École normale supérieure Paris-Saclay | es |
| dc.rights | Acceso abierto | * |
| dc.subject | Optical character recognition | es |
| dc.subject | Automatic Printed Text Recognition | es |
| dc.subject | Gaussian mixture models | es |
| dc.subject | Digital bibliography | es |
| dc.title | Comment briser le plafond de verre de la reconnaissance automatique de texte imprimé | es |
| dc.type | Tesis de maestría | es |
| dc.subject.anii | Ciencias Naturales y Exactas | |
| dc.subject.anii | Matemáticas | |
| dc.subject.anii | Matemática Aplicada | |
| dc.subject.anii | Ingeniería y Tecnología | |
| dc.subject.anii | Ingeniería Eléctrica, Ingeniería Electrónica e Ingeniería de la Información | |
| dc.subject.anii | Ciencias Sociales | |
| dc.subject.anii | Comunicación y Medios | |
| dc.subject.anii | Bibliotecología | |
| dc.identifier.anii | POS_EXT_2023_2_180123 | es |
| dc.type.version | Revisado | es |
| dc.anii.subjectcompleto | //Ciencias Naturales y Exactas/Matemáticas/Matemática Aplicada | es |
| dc.anii.subjectcompleto | //Ingeniería y Tecnología/Ingeniería Eléctrica, Ingeniería Electrónica e Ingeniería de la Información/Ingeniería Eléctrica, Ingeniería Electrónica e Ingeniería de la Información | es |
| dc.anii.subjectcompleto | //Ciencias Sociales/Comunicación y Medios/Bibliotecología | es |
| Aparece en las colecciones: | Publicaciones de ANII | |
Archivos en este ítem:
| archivo | Descripción | Tamaño | Formato | ||
|---|---|---|---|---|---|
| Rapport_de_stage__Diego_Belzarena.pdf | Descargar | 16.53 MB | Adobe PDF |
Las obras en REDI están protegidas por licencias Creative Commons.
Por más información sobre los términos de esta publicación, visita:
Reconocimiento-CompartirIgual 4.0 Internacional. (CC BY-SA)
