Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

BNK File Converter: Assume that you are working for a local bank to develop a ne

ID: 3678130 • Letter: B

Question

BNK File Converter:

Assume that you are working for a local bank to develop a new banking system. The current system stores bank accounts in a text file, commonly called "comma separated values" (CSV). The first line in this file contains the field names and remaining ones have detailed information of the stored accounts, each line for an account. All data values are separated by commas. This is an example snippet of this file.

Account,Name,Balance
333,Kate Wilson,1512.34
101,Adam Smith,100.23
212,Mary Lee,-10.56

It is easy to see that this is not an efficient way to store and manage data. For example, when a customer deposits a check, the current system needs to search for his account and update the new balance. Because this is a text file, to search for an account, the system needs to read all the file content until it finds the corresponding account. Then, it needs to rewrite the whole file so the line for that account can be updated with the new balance. (There is no other way to update a line in a text file). Thus, in the new system, you decide to store bank accounts in a binary file because you could read and update a specified part of a binary file at the same time without touching others. In addition, you can also maintain an in-memory index which can help the search for the accounts stored in the file faster. You call this file format BNK and design it with the following structure.

Header

The first part of a BNK file is the header. It is 32-byte long and stores the following information:

The first 4 bytes is the char sequence BANK, which is used as a signature for BNK files. That means, if a file does not contain 4 characters BANK at the beginning, it is not a BNK file.

The next item is the total number of accounts, stored as a 4-byte integer.

The remaining 24 bytes are space reversed for future usages.

This header can be declared in C++ as a struct, for example:

Account Data

If N is the total number of accounts, there will be N account records stored consecutively after the header. Each record stores the account number (a 4-byte integer), the holder name (at most 20 characters including the ending NULL), the balance (a double value of 8 bytes), and a reserved space of 96 bytes. That means, the size of an account record is 128 bytes. We can declare such a record as a struct in C++ as the following:

Index Data

The last part of a BNK file contains N consecutive index records. Each record contains the account number and the position of the corresponding account record in the BNK file. For those index records, the account numbers are sorted increasingly, so we can use binary search later on it. For example, for the account data given above in the CSV file, the record of account 333 is stored as position 32 (right after the 32-byte header), that of account 101 is stored at position 32+128=160, and that of account 212 is stored at position 160+128=288. The index data will contain the following records: {101, 160}, {212, 288}, and {333, 32}. With such file positions, we can access an account easily. For example, after loading the index into memory, if we want to know the record of account 212, we will search for account number 212 in the index and find its position is 288 in the BNK file.

Index records can be declared using the following struct in C++:

Attention: To ensure the portibility of your code, you should always use function sizeof to compute the data size of those structs. For example, to read to the memory an array of N BNKIndex structs from a binary file, you could use this statement:

Content:

An example CSV file can be found below:

Account,Name,Balance
5392,Phil Carr,0.0
1798,Ruth Parsons,100.45
5467,Jake Sanderson,457.23
432,Benjamin Ross,-123.65
60,Keith Ellison,-87.45
355,Pippa Davidson,653.56
1727,Katherine Lambert,1356.86
8584,Nicola Chapman,42524.65
9355,Irene Piper,87324.44
57273,Isaac Wallace,6454.54
64212,Julia Robertson,-452.54
26871,Diana Dowd,629.56
46856,Lisa Howard,-1654.68
2251,Carolyn Vaughan,72.29
53611,Eric Manning,123.65
5020,Adrian Simpson,-4445.89
27363,Kevin Graham,752.76
13789,Nicholas Kelly,666.67
128,Brandon Bailey,777.78
3800,David Vance,-444.45
1584,Mary Gill,111.11
61036,Vanessa Morrison,563.56
53057,Nicholas Rutherford,526.09
2641,Fiona Parsons,25.35
30594,Warren Lyman,268.87
10287,Kylie Hudson,-566.65
65504,Adrian Dickens,-245.46
55900,Felicity Bailey,9865.78
27661,Sarah James,5609.56
50171,Peter Lawrence,-4555.54
31516,Amanda Oliver,-4579.45
61504,Ruth Lyman,76.45
40828,Amelia Wright,0.00
44035,Blake Sutherland,45.45
58214,Wanda Simpson,425245.45
22450,Sam Walsh,9999.99
24698,Richard Bell,342.56
31078,Maria Hudson,-4625.76
39021,Joan Duncan,-7653.87
28115,Ian Bond,8495.65
35401,Julian Bailey,8438.23
42862,Harry Kerr,23.32
10542,Jane Walker,54.45
23035,Sally Robertson,56.65
43481,Connor Simpson,76.67
40233,Zoe Hardacre,-87.78
2912,Megan Reid,89.98
52121,Lisa Short,12.21
24062,Una Young,32.23
30694,Neil King,43.34

Programming Tasks:

Task 1 (20 points):

You need to write a program to convert a CSV file to a BNK file. This program can work following this procedure:

Open the CSV file (as text, for input) and BNK file (as binary, for output).

Read quickly through the CSV file (e.g. just counting the number of lines without parsing each line) to determine the total number of accounts (N).

Put N into a BNKHeader struct and write it to the BNK file.

Allocate a (dynamic) array of N BNKIndex records for the in-memory index data.

Re-read the CSV file line by line, parse each line into a BNKAccount record, and write it to the BNK file. Before writing, you also add to the in-memory index array the information for this BNKAccount record (e.g. its account number and its corresponding file position, which is provided by function tellp).

After parsing the CSV file, you sort the index array by account numbers, and write it to BNK file.

Explanation / Answer

#include <cstdlib>
#include <fstream>
#include <iostream>
#include <sstream>
#include <string>
#include <string.h>

using namespace std;

struct BNKHeader {
   char signature[4]; // {'B','A','N','K'};
   int numberOfAccounts;
   char reserved[24];
};

struct BNKIndex {
   int accountNumber;
   long filePosition;
};

struct BNKAccount {
   int number;
   char name[20];
   double balance;
   char reserved[96];
};

void writeSig(char sig[]) {
   sig[0] = 'B';
   sig[1] = 'A';
   sig[2] = 'N';
   sig[3] = 'K';
}

void QuickSort(BNKIndex items[], int first, int last) {
   int pivot = items[(first + last) / 2].accountNumber;
   int i = first, j = last;

   while (i <= j) {
       while (items[i].accountNumber < pivot)
           i++;       // find items[i] in the left that >= pivot
       while (items[j].accountNumber > pivot)
           j--;       // find items[j] in the right that <= pivot
       if (i <= j) { // they are not in order
           BNKIndex tmp = items[i];
           items[i] = items[j];
           items[j] = tmp; // exchange
           i++;
           j--; // move on
       }
   }       // stop when such no pair of items exist

   if (first < j)
       QuickSort(items, first, j); // sort similar for the left part
   if (i < last)
       QuickSort(items, i, last); // sort similar for the right part
}

int main() {
   int N = 0;
   int count = 0;
   int counter = 0;
   string line, acc, name, balance;
   ifstream file;
   file.open("input.txt");
   ofstream bin;
   // counting number of accounts in the file.
   while (getline(file, line))
       ++N;
   file.close();
   // don't count the header as an account
   N = N-1;
   // instantiating the structs where we'll store the data.
   struct BNKHeader header;
   struct BNKAccount data;
   bin.open("output.bnk");
   //writing to the header struct
   header.numberOfAccounts = N;
   writeSig(header.signature); // writing "BANK" to struct
   //writing the header to the bin file.
   bin.write(header.signature, 4); // 4 bytes of data
   bin.write((char*) &header.numberOfAccounts,
           sizeof(header.numberOfAccounts));
   bin.write(header.reserved, 24); // 24 bytes of data.
   // instantiating a dynamic array of size N.
   BNKIndex *indexArray = new BNKIndex[N];
   BNKIndex tmpIndex;
   // had to open and close this to reset the stream for some reason.
   file.open("input.txt");
   file.ignore(100, ' '); //ignore the header of the file as we've already read it.
   while (getline(file, line)) {
       stringstream ss(line);
       // we use the struct once, and then write it to the file, the overwrite it again.
       while (getline(ss, acc, ',')) {
           // handy algo to figure out where each bit of data needs to go.
           if (count % 3 == 0) {
               data.number = atoi(acc.c_str());
           } else if ((count - 1) % 3 == 0) {
               memset(data.name, 0, sizeof(data.name));
               acc.copy(data.name, 20, 0);
           } else if ((count - 2) % 3 == 0) {
               data.balance = atof(acc.c_str());
           }
           if ((count - 2) % 3 == 0) {
               // writing the data to the indexArray (account number and it's file position to search for later.
               tmpIndex.accountNumber = data.number;
               tmpIndex.filePosition = bin.tellp();
               indexArray[counter++] = tmpIndex;

               // writing the data to the file.
               bin.write((char*) &data.number, sizeof(data.number));
               bin.write(data.name, 20);
               bin.write((char*) &data.balance, sizeof(data.balance));
               bin.write(data.reserved, 96);
           }
           // increase count for the algorithm above.
           count++;
       }

   }
   //DEBUG CODE TO CHECK BEFORE QUICKSORT
   /*
   cout << "PRESORT" << endl;
   for(int i = 0; i < N;i ++){
       cout << indexArray[i].accountNumber<< endl;
       cout << indexArray[i].filePosition << endl;
   }
   */
   QuickSort(indexArray, 0, N-1);
   //DEBUG CODE TO CHECK AFTER QUICKSORT
   /*
   cout << "POSTSORT" << endl;
   for(int i = 0; i < N;i ++){
       cout << indexArray[i].accountNumber<< endl;
       cout << indexArray[i].filePosition << endl;
   }
   */
   // had to use a for loop to write the index to the file, for some reason passing the address didn't work when reading (see commented attempt below)
   for(int i = 0;i < N;i++){
       bin.write((char*)&indexArray[i], sizeof(BNKIndex));
   }
   delete[] indexArray;
   indexArray = NULL;
   // attempt to write the address of the index rather than the loop, failed when trying to read.
    //bin.write((char*) &indexArray, N* sizeof(BNKIndex));
   // closing the bin after writing the data to it.
   bin.close();
}
input.txt
Account,Name,Balance
333,Kate Wilson,1512.34
101,Adam Smith,100.23
212,Mary Lee,-10.56
1,Joseph Smith,100000.56
210,Jo Mama,-29.96
69,Pamela Anderson,69.69
43,George W. Bush,250000.00
13,Test Account,-200.56
2500,jason sheppard,-10.56212
2512,ANOTHER LINE TEST,-70.56