{- | Fourth assignment for IB016, semester spring 2019

== Analysis of the Ministry of Finance's invoices

In this assignment, you'll get to parse and analyze an open dataset of
invoices processed by the Ministry of Finance of the Czech Republic in 2018.

= The high-level concept

This assignment tries to tackle a real problem. With the quirks, compromises,
(sometimes) lack of metadata the real data tend to have. Furthermore, you
are given freedom to run the analyses you're actually interested in, not those
prescribed by us.

All that means logic and common sense have priority over details of the task 
specification. If you find something more useful to be done slightly
differently, it can be allowed -- ask in the discussion forum describing what
you need and why. Nevertheless, we took a deep thought to design the 
task reasonably, so in return, we ask you to think deep before you ask for a
diversion. Furthermore, we'll need to grade it afterward and having the same 
interface (=data types) helps us a lot.

That being said, you can use this "real-world task" to practice combining
your functional programming skills. The author's solution has quite some 
monadic operators, leverages monoids and heavily uses functions as return 
values (ehm, custom structures full of functions, in fact).

Overall, use common sense, enjoy the possibilities of functional programming
and learn about your homeland by examining the open data shared by the
government.

= Assignment overview

The assignment consists of these tasks:

  1. Locate the open data repository of the Ministry of Finance of the Czech
     Republic and download the CSV file with invoices paid in 2018
     (to be more precise, invoices paid between 2018-01-01 and 2018-11-01).
  2. Determine the license/use conditions of the dataset and summarize them
     in one sentence in the submitted source code.
  3. Write a Parsec parser for the CSV file according to the specification
     below.
  4. Write reasonable pretty-printers for the data (example given below).
  5. Using the parsed data, perform an analysis to find out something
     interesting. Report at least five results. See the details below.

= Parser details

Overall, the dataset contains more information than we are interested in.
See the documentation of the datatypes below to know what to keep (everything
else can be dropped at parse-time).

Write a parser that produces the structure given below reasonably efficiently.
Linear time complexity is not needed (not even realistic, since invoices
are stored in a map), but try to avoid intermediate data structures and
unnecessary re-processing of the data. A bad example would be parsing the whole
file into lines first, then reiterating to parse the lines into CSV fields,
then passing the third time to convert dates, etc.

Your program has to work on the original unmodified (!) file downloaded 
from the open data repository (this ensures 
replicability of your analyses and shows you have not manipulated the data).
That being said, you can have "unparsed lines" after you process the file
(there is a dedicated list for these, see the datatype definition below).
This is allowed to not bother you with super-rare border cases that you would,
in reality, probably clean by hand. Example for a thousand words: Three 
suppliers in the dataset have newlines in their names. The ideal solution,
of course, would be to allow these (and possibly other escaped characters)
but since it's just three, you need not to bother in the bulk of over 7200 
invoices. Please don't have more than a couple of such unparsed lines.

As for the pretty-printers, use the format that seems the most understandable
to you when doing your analyses. An example of mine is provided below.

@
Faktura č. 1888800244 (Z)
    Dodavatel:    Úřad pro zastupování státu ve věcech majetkových (IČO 69797111)
    Suma:         9130.00 Kč (vystaveno 2015-07-01, splatnost 2018-05-15)
    Zaplaceno:    9130.00 Kč (  přijato 2018-05-09, zaplaceno 2018-05-04)
    Z rozpočtu:   1300.00 Kč (Studená voda), 3300.00 Kč (Teplo), 4000.00 Kč (Elektrická energie), 530.00 Kč (Teplá voda)
@

= Analysis details

As for the performed analyses, you are not constrained. Run what you find 
intriguing and report a summary in the source file handed in the IS.
Note, however, that there has to be the source code to produce your stats
available in the source file handed in! I.e., the stats reported in the free 
text without the accompanying Haskell code will not be counted valid (as you 
could have generated them using Excel on the same dataset).

Optionally, paste the summary of the findings (without the source
code) to the discussion forum in the IS (even before the assignment deadline).
Remember, one of the aims of the task is to get to know the economics of your
ministry.

Inspiration for possible analyses:

  * Which invoices did the ministry pay to Masaryk University?
  * What ratio of invoices did the ministry pay overdue?
    (Remember that some invoices may have been delivered already overdue.)
  * What are the sums for individual sub-budgets?
  * To which supplier did the ministry pay the most?
  * How much did the ministry pay to the companies owned by the current 
    prime minister?
  * And so on...

= Bonuses

During grading, you can get up to 3 points of bonuses for extra work (either
other code features or better analysis). These are assigned subjectively but 
will probably apply if you do something of the following:

  * Perform larger or more complicated analyses of the dataset.
  * Perform analyses on the extended dataset (e.g., incorporating invoice data
    from other years).
  * Use some advanced concept from the other seminars in the solution (e.g.,
    /lenses/ for manipulating the data structures, /monoids/, appropriate
    language extensions, ...)
  * Note: If you decide to use lenses, feel free to rename record field names
    to start with an underscore. You may also consider to use the package
    <https://hackage.haskell.org/package/lens-datetime lens-datetime>.

= Modules and packages

You can use any module from packages
<https://hackage.haskell.org/package/base/ base> and
<https://hackage.haskell.org/package/containers/ containers>.
For parsing, use the package
<https://hackage.haskell.org/package/parsec/ parsec>.
For working with dates, use the package
<https://hackage.haskell.org/package/time/ time>.
If you wish so, you can also use Unicode syntax from
<https://hackage.haskell.org/package/unicode-prelude unicode-prelude>.

In case you feel the need to use some other package (especially in the 
analytical part), it's probably OK. However, double-check
with the assignment author in the discussion forum first.
-}

-- ----------------------------------------------------------------------------
-- Name:
-- UCO:
-- ----------------------------------------------------------------------------

module HW04 where

-- package containers
import qualified Data.Map.Strict as M
-- package parsec
import Text.Parsec
import Text.Parsec.String ( Parser, parseFromFile )
-- package time
import Data.Time.Calendar ( Day )

-- #### Data type declarations ####

-- | The high-level data structure for all parsed data.
data InvoiceData = InvoiceData
  { invoices  :: Invoices  -- ^ selected invoice data (see below)
  , suppliers :: Suppliers -- ^ information about common suppliers (those having IČO)
  , budgets   :: Budgets   -- ^ information about ministry sub-budgets
  , notParsed :: [String]  -- ^ list of lines that could not be successfully parsed
  }

-- | Invoices are stored in a map keyed by invoice ID (@[CISLO]@).
-- Beware, the dataset contains multiple lines with the same invoice ID.
-- As these differ only in the sub-budget payment, merge them together
-- (keeping all the sub-budget information).
type Invoices = M.Map InvoiceID Invoice

-- | Supplier names (@[DODAVATEL]@) are stored in a map keyed by their IČO (@[ICO]@).
-- Suppliers without the ICO identification are not stored here.
type Suppliers = M.Map ICO String

-- | Sub-budget names (@[NAZEVPOLOZKYROZPOCTU]@) are stored in a map keyed 
-- by the budget ID (@[POLOZKAROZPOCTU]@).
type Budgets = M.Map SubBudgetID String

-- | Invoice ID (@[CISLO]@) is internaly an @Int@ but is wrapped in a newtype
-- to ensure type safety.
newtype InvoiceID = InvoiceID { unInvoiceID :: Int } deriving (Eq, Ord, Show)

-- | Supplier IČO (@[ICO]@) is internaly an @Int@ but is wrapped in a newtype
-- to ensure type safety.
newtype ICO = ICO { unICO :: Int } deriving (Eq, Ord, Show)

-- | Sub-budget ID (@[POLOZKAROZPOCTU]@) is internaly an @Int@ but is wrapped in a newtype
-- to ensure type safety.
newtype SubBudgetID = SubBudgetID { unSubBudgetID :: Int } deriving (Eq, Ord, Show)

-- | All invoice metadata. Supplier and sub-budget are identified by IDs only.
data Invoice = Invoice
  { supplier      :: Either String ICO  -- ^ supplier IČO (@[ICO]@) if exists, supplier name (@[DODAVATEL]@) if not
  , dateIssued    :: Day                -- ^ date the invoice was issued (@[DATUMVYSTAVENI]@)
  , dateDelivered :: Day                -- ^ date the invoice was delivered (@[DATUMPRIJETI]@)
  , dateDue       :: Day                -- ^ date the invoice was due (@[DATUMSPLATNOSTI]@)
  , datePaid      :: Day                -- ^ date the invoice was paid (@[DATUMUHRADY]@)
  , documentType  :: DocumentType       -- ^ invoice type (@[TYPDOKLADU]@)
  , amountDue     :: Money              -- ^ amount due in CZK, VAT included (@[CELKOVACASTKA]@)
  , amountPaid    :: Money              -- ^ amount paid in CZK, VAT included (@[CUHRADA]@)
  , subBudgets :: [(Money,SubBudgetID)] -- ^ amounts from individual sub-budgets (@[CASTKAZAPOLOZKUROZPOCTU]@, @[POLOZKAROZPOCTU]@)
  } deriving Show

-- | Type of the invoice as provided in the data (@[TYPDOKLADU]@).
-- Unfortunately, I was unable to find out what precisely these mean :-|.
data DocumentType = F -- ^ maybe a common invoice?
                  | W -- ^ maybe a cancelled invoice?
                  | Z -- ^ maybe a regular invoice paid in advance?
                  deriving (Eq, Show)

-- | Money amounts are stored as simple @Double@s.
type Money = Double

-- #### Custom data types and class instances ####

-- TBA

-- #### Constants ####

-- TBA

-- #### Parsers ####

-- TBA

-- #### Pretty printers ####

-- TBA

-- #### Data analysis utility functions ####

-- TBA

-- | Parse the file fiven in the first command-line argument.
-- | In case of successful parse, pretty-print the parsed database.
-- | In case of parse failure, print the error.
main :: IO ()
main = undefined