Skip to content

fuksja/pdf_to_doc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pdf_to_doc

pdf_to_doc is a project to construct a simple web page, allowing a user to easily convert a pdf document into editable doc one.

Contents

Author

Requirements and Limitations

Stakeholder requirements: there is a need for a tool for fast and automated conversion of not editable pdf documents to editable .doc format.

General description of fuctionality: user goes to the upload page, uploads a pdf file, conversion takes place and user receives doc file as output.

EDIT: added new functionality: conversion to .pptx format. User uploads file and chooses whether to convert to .doc or .pptx and converted file pops out.

Assumptions:

  • the project will only be using open source software and will be open software licensed
  • no conversion of encrypted files for now
  • all pages converted as default
  • custom max file size limitation
  • no special security features
  • simple conversion from pdf to .pptx as images put in slides, no strings OCRed

Limitations:

  • english language version for now
  • no security features, user profiles, login option, session control, simple file input and output for now
  • for conversion to doc format:
    limitations derived from conversion method and library pdf2doc:
    • text based files
    • language from left to right
    • no rotation possible
    • no 1:1 layout conversion achievable
  • for conversion to .pptx format: limitations derived from conversion method and library pdf2pptx:
    • each original file page rendered as a PNG image and input into a Powerpoint slide
    • slides not editable, no OCR - but may be presented as slides

Getting started

Time frame

First part of the project completed in June 2022. Second part, with addition of .pptx feature completed in July. Project will be updated in the future.

Documentation

This github repository serves as projects documentation.

License and copyright notice

This project uses GPLv3 license and MIT license. Part of this project is derived from other software, created by other programmers, community or made in different way also under the GNU General Public License v3.0:

Source of pdf2docx library used for file conversion to .doc
License

Source of pdf2pptx library used for file conversion to .pptx
License