Usage Overview

Paperless is an application that manages your personal documents. With the help of a document scanner (see Scanner recommendations), paperless transforms your wieldy physical document binders into a searchable archive and provices many utilities for finding and managing your documents.

Terms and definitions

Paperless esentially consists of two different parts for managing your documents:

  • The consumer watches a specified folder and adds all documents in that folder to paperless.

  • The web server provides a UI that you use to manage and search for your scanned documents.

Each document has a couple of fields that you can assign to them:

  • A Document is a piece of paper that sometimes contains valuable information.

  • The correspondent of a document is the person, institution or company that a document either originates form, or is sent to.

  • A tag is a label that you can assign to documents. Think of labels as more powerful folders: Multiple documents can be grouped together with a single tag, however, a single document can also have multiple tags. This is not possible with folders. The reason folders are not implemented in paperless is simply that tags are much more versatile than folders.

  • A document type is used to demarkate the type of a document such as letter, bank statement, invoice, contract, etc. It is used to identify what a document is about.

  • The date added of a document is the date the document was scanned into paperless. You cannot and should not change this date.

  • The date created of a document is the date the document was intially issued. This can be the date you bought a product, the date you signed a contract, or the date a letter was sent to you.

  • The archive serial number (short: ASN) of a document is the identifier of the document in your physical document binders. See The recommended workflow below.

  • The content of a document is the text that was OCR’ed from the document. This text is fed into the search engine and is used for matching tags, correspondents and document types.

Frontend overview

Warning

TBD. Add some fancy screenshots!

Adding documents to paperless

Once you’ve got Paperless setup, you need to start feeding documents into it. Currently, there are three options: the consumption directory, IMAP (email), and HTTP POST.

The consumption directory

The primary method of getting documents into your database is by putting them in the consumption directory. The consumer runs in an infinite loop looking for new additions to this directory and when it finds them, it goes about the process of parsing them with the OCR, indexing what it finds, and storing it in the media directory.

Getting stuff into this directory is up to you. If you’re running Paperless on your local computer, you might just want to drag and drop files there, but if you’re running this on a server and want your scanner to automatically push files to this directory, you’ll need to setup some sort of service to accept the files from the scanner. Typically, you’re looking at an FTP server like Proftpd or a Windows folder share with Samba.

IMAP (Email)

You can tell paperless-ng to consume documents from your email accounts. This is a very flexible and powerful feature, if you regularly received documents via mail that you need to archive. The mail consumer can be configured by using the admin interface in the following manner:

  1. Define e-mail accounts.

  2. Define mail rules for your account.

These rules perform the following:

  1. Connect to the mail server.

  2. Fetch all matching mails (as defined by folder, maximum age and the filters)

  3. Check if there are any consumable attachments.

  4. If so, instruct paperless to consume the attachments and optionally use the metadata provided in the rule for the new document.

  5. If documents were consumed from a mail, the rule action is performed on that mail.

Paperless will completely ignore mails that do not match your filters. It will also only perform the action on mails that it has consumed documents from.

The actions all ensure that the same mail is not consumed twice by different means. These are as follows:

  • Delete: Immediately deletes mail that paperless has consumed documents from. Use with caution.

  • Mark as read: Mark consumed mail as read. Paperless will not consume documents from already read mails. If you read a mail before paperless sees it, it will be ignored.

  • Flag: Sets the ‘important’ flag on mails with consumed documents. Paperless will not consume flagged mails.

  • Move to folder: Moves consumed mails out of the way so that paperless wont consume them again.

Caution

The mail consumer will perform these actions on all mails it has consumed documents from. Keep in mind that the actual consumption process may fail for some reason, leaving you with missing documents in paperless.

Note

With the correct set of rules, you can completely automate your email documents. Create rules for every correspondent you receive digital documents from and paperless will read them automatically. The default acion “mark as read” is pretty tame and will not cause any damage or data loss whatsoever.

Note

Paperless will process the rules in the order defined in the admin page.

You can define catch-all rules and have them executed last to consume any documents not matched by previous rules. Such a rule may assign an “Unknown mail document” tag to consumed documents so you can inspect them further.

Paperless is set up to check your mails every 10 minutes. This can be configured on the ‘Scheduled tasks’ page in the admin.

REST API

You can also submit a document using the REST API, see POSTing documents for details.