running a custom shell script distributed shell apache yarn

I have been going through the Apache Hadoop Yarn Book from HortonWorks, wherein they have explained two ways of running a Yarn task.

My intent is to run a shell script ( which compiles and runs various java and python scripts ) and it runs a set of these scripts/patches for various folders. An easy metaphor :- ” Unzipping 100 folders and logging their ‘ls’ “

Now say I want to parallelize the flow , such that a container runs for 1-2 folders and say I ask for 50 such containers.

How do I do that using distributed shell ? I have seen examples of ls / whoami / uptime / hostname but that is not what I want. I want to run a script that takes / iterated over argument path and this I want to run in a distributed fashion on yarn. Any Help ?

Source: shell

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.