Ubuntu 14.04 64 bit LTS – minimal install- updated.
2x 6 core Xeon,
12 GB ECC memory,
Storage RAID 10 = 4 TB,
File system = ext4,
Above server is dedicated to this project.
grep more efficiently, get less false positives, and “cleaner” results and export only email accounts to txt file.
I have many large files in all kinds of formats, .csv, .excel, .txt, .sql etc
Some files are compressed zip, rar, gz etc. (I will be attempting
The files reside on a Windows 2012 server, I have mounted the share on the Ubuntu box, and I need to extract all emails to txt file.
I have done tons of researched and played with various regex but cannot get it working 100% as expected.
grep -Rs .*@.* . >> emails.txt
Second attempt: (after research)
grep -e '^.*@.*..*' -r -n -h >> emails.txt
Third attempt: (for better performance)
LANG=C grep -e '^.*@.*..*' -r -n -h >> emails.txt
Fourth attempt: (even “better” performance, but this depends on hardware)
cat * */* */*/* | parallel --pipe -N 250 --round-robin “grep -e '^.*@.*..*' -r -n -h >> emails.txt"
With first second and third attempt, I am still getting a ton of “junk” exported.
With the fourth example
cat still complains about folders, I tried running it with
find . but then I get only the files that contain the mail accounts in the output.
Any and all assistance will be greatly appreciated.