Compare two unsorted files line by line and only output lines which are in file 1 AND file 2, but not in file 1

I need to Compare two unsorted files line by line and only output lines which are in file 1 AND file 2, but not in file 1. Essentially giving me New + Same strings from each of the two files, and excluding the old/non-existing strings.

I need to do this on some very large files. (10+ GB)(About 1,000,000 lines).

I have tried a few of the below options, but nothing gives me exactly what I need:

join -v1 -v2 <(sort File1.txt) <(sort File2.txt) > File3.txt

This “join” seems to give me the lines that are in both File1.txt AND/OR File2.txt. (Essentially giving me a combine+unique command). (This is almost correct, but I need this to exclude the the lines/strings if they are not in the second file, but are in the first file.

fgrep -vf File1.txt File2.txt > File3.txt

This works but as you know, is Very slow on large files and is not really an option.

Case sensitivity would be nice, but not at all required. The reason I mention this is because in my research I found that if the compare was case insensitive, it would speed up the search a Lot.

Thanks again in advance.


Source: python

Leave a Reply