Find the maximum and a set of largest numbers (in scientific notation) contained in a huge ascii file

Background:

(1) Here is what I extract from a huge ascii file of around 700Mb:

0, 0, 0, 0, 0, 0, 0, 0, 3.043678e-05, 3.661498e-05, 2.070347e-05,
    2.47175e-05, 1.49877e-05, 3.031176e-05, 2.12128e-05, 2.817522e-05,
    1.802658e-05, 7.192285e-06, 8.467806e-06, 2.047874e-05, 9.621194e-05,
    4.467542e-05, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.000421869,
    5.0003081213, 0.0001938675, 8.70334e-05, 0.0002973858, 0.0003385935,
    8.763598e-05, 2.743326e-05, 0, 0.0001043894, 3.409237e-05, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0;

(2) I would like to do two tasks:

(2.1) Find the maximum among the numbers separated by colons and semicolons.

It is 5.0003081213 in the above extracted lines.

(2.2) Find the largest 4 (says) values among the lines.

It is 5.0003081213, 0.000421869, 0.0003385935 and 0.0002973858 in the above extracted lines.


My thought:

(3) I expect to do the work with perl.

(4) I think that I can match the number with ([0-9.e-]+).


My Problem:

(5) However, I am new to perl and unix and I do not know how to proceed to find the maximum values.

(6) I searched similar questions for a half day and found that I may make use of List::Util. I do not know it is an appropriate choice for my problem and actually I do not know how this subroutine can be adopted.

(7) Says, the numbers are contained in a file, named input.txt. May I know if it is possible to finish the tasks with a one line script?

Thanks for your understanding and I appreciate so much for your help.


Source: unix

Leave a Reply