I have been going through the Apache Hadoop Yarn Book from HortonWorks, wherein they have explained two ways of running a Yarn task.
My intent is to run a shell script ( which compiles and runs various java and python scripts ) and it runs a set of these scripts/patches for various folders. An easy metaphor :- ” Unzipping 100 folders and logging their ‘ls’ “
Now say I want to parallelize the flow , such that a container runs for 1-2 folders and say I ask for 50 such containers.
How do I do that using distributed shell ? I have seen examples of ls / whoami / uptime / hostname but that is not what I want. I want to run a script that takes / iterated over argument path and this I want to run in a distributed fashion on yarn. Any Help ?