We all love making people happy. And happiness sometimes comes with just that little bit of extra help. Recently we had a request for the simplest thing you could possibly imagine. One of our coworkers wanted something that converted a set of lines into an excel file. The import capabilities of excel are virtually limitless. But that doesn’t mean that the tools are always easy to use.

Let’s analyze the problem. So we have a list of lines where every line has to be in a separate excel cell. Good. And when does the next line start? In this case, it was obviously a formatted field. But it might have been after a fixed set of lines. So let’s build a little tool to just convert a set of lines in a data blob into a tab-separated line-by-line output.

Oh yes, and trimming leading and ending spaces would be greatly appreciated. And no, the data did not contain any tabulating characters. With these parameters, we set out to extend the “hello world” app we created in a set of previously published blog articles.

The basics

The input is determined by a number of fields. Most important is the ‘data’ field. This will contain the lines of data. A set of check-boxes will allow choosing the elimination of empty lines and trimming of spaces. If a fixed number of input lines per output line is used that number can be set. If the number is not set, counting will not be in effect. Alternatively, one of four pre-determined regular expressions can be used to fix the line end of the output. If both count and regular expression or defined the one that matches first will trigger output to move to the next line.

This is how the template looks like:

 

<TMPL_INCLUDE name="header.tmpl">
<!-- Page Content -->
<div class="container">
   <header class="jumbotron my-4">
      <img src="style/nxp2.png" class="float-right w-25">
      <h1 class="display-5">Convert to XLS</h1>
   </header>
   <div class="my-2">
      <form>
         <div class="form-group">
            <label for="textinput">Provide your text</label>
            <textarea class="form-control" id="textinput" rows="5" name="data"></textarea>
         </div>
         <div class="form-group">
            <label>Eliminate empty lines</label> <input class="form-checkbox" type="checkbox" name="emptyline" checked>
         </div>
         <div class="form-group">
            <label>Trim leading and ending spaces</label> <input class="form-checkbox" type="checkbox" name="trimspaces" checked>
         </div>
         <div class="form-group">
            <label>Count lines (zero for no count)</label> <input class="form-control" name="count" value="0">
         </div>
         <div class="form-group">
            <label>Regex detecting last line</label> <select class="form-control" name="re">
            <option value="date_regex">Date regex with slash (dd/mm/yyyy)</option>
            <option value="date_regex_dash">Date regex with dash (dd-mm-yyyy)</option>
            <option value="number">Number</option>
            <option value="float">Float</option>
            <option value"">No regex</option>
            </select
         </div>
         <input class="btn btn-primary" type="submit" value="Parse">
      </form>
   </div>
</div>
<!-- /.container -->
<TMPL_INCLUDE name="footer.tmpl">

Referring to the previous blog articles, this is a straight forward implementation of an HTML form that requests for the various parameters that we will use to parse data and generate the output.
Let’s have a look at the function parsing the data and using the parameters.

my %re= (
        date_regex => '^\d+\/\d+\/\d+$',
        date_regex_dash => '^\d+-\d+-\d+$',
        number => '^\d+$',
        float => '^\d+\.\d+$',
);
sub parselines {
        my $env = shift;

        my $data = $env->{req}->param('data');
        my $empty = $env->{req}->param('emptyline');
        my $trimspaces = $env->{req}->param('trimspaces');
        my $count = $env->{req}->param('count')+0;
        my $re = $re{$env->{req}->param('re')};
        if($data) {
                my $i;
                my @lines = ();
                my $out = '';
                for(split(/\n/, $data)) {
                        next if($empty and /^\s*$/);
                        if($trimspaces) {
                                s/^\s+//;
                                s/\s+$//;
                        }
                        push(@lines, $_);
                        $i++;
                        if($i==$count or ($re and /$re/)) {
                                $out .= join("\t", @lines)."\n";
                                @lines = ();
                                $i = 0;
                        }
                }
                return [200, ['content-type' => 'text/csv'], [$out]];
        }

        return Page::content($env);
}

We start by defining a hash of regular expressions and their reference name as the key. You can find these references in the selection of the HTML page. The routine itself uses the pre-parsed data stored and accessible via the ‘re’ and ‘data’ reference into the $env variable. We extract the relevant parameters and assign them to variables. Next, we determine if this is the first call to the routine. If there is no data, the entire “if” block is skipped and we simply end by visualizing the template that we have created above. Remember that if we give the template the same name as the URL we are using we can simply call the Page::content routine and there is no additional work that needs to be done.

In case we have data, we have a request that has been passed using the form. In this case, we start splitting up the data using a simple “split” function and splitting on the new-line character. Note that in general, this is a little short-sighted. We might need to include other line separating characters in the future but for the time being this seems to work fine. To run over all the lines we start by checking if the line is empty, and if the “eliminate empty lines” flag is set and true we just skip to the next line. Next up is the trimming of the spaces, then pushing the current line on the stack.

The last part is to verify if we have reached the current end of the output line. Either the count needs to exact match or the regular expression has to match the content. At that point, we join all the fields in the list and add them as an extra line to the output. We then reset all variables to start creating the next line of output.

Eventually, all input lines will have been parsed and we simply return the “TAB” separated lines with content-type “text/csv” back to the browser. If the mime-type for text/csv of your browser is set up correctly, the right application will automatically open.

Try it yourself!

Before building this simple tool, check out how it works https://halloworld.eu-de.mybluemix.net/toxls.

And a big smile

Although we just scratched the surface of building a tool that does the conversion to excel, this little piece of code provided the right balance between development time and the horrible job of having to reshuffle hundreds of fields in excel. Hopefully, this little tool will put a smile on more faces.

At Nexperteam we love to help solve little problems that seem too simple to be programmed but could save a ton of time if tackled the right way. If you are stuck with little problems like these feel free to reach out, most likely a simple solution exists that will save you a lot of time … and put a smile on your face.