Created
October 11, 2011 13:10
-
-
Save gphat/1278030 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Storm Multi-Language Support | |
## The ShellBolt | |
Support for multiple languages is implemented via the ShellBolt class. This | |
class implements the IBolt interfaces and contains the facilities for | |
executing a script or program via the shell using Java's ProcessBuilder class. | |
## The Wrapper Class | |
You'll need to create a Java class that wraps your script and declares the fields | |
involved. You can learn more about this https://github.com/nathanmarz/storm/wiki/Concepts | |
## Protocol Preamble | |
A simple protocol is implemented via the STDIN and STDOUT of the executed | |
script or program. A mix of simple strings and JSON encoded data are exchanged | |
with the process making support possible for pretty much any language. | |
# Packaging Your Stuff | |
To run a ShellBolt on a cluster, the scripts that are shelled out to must be | |
in the resources directory within the jar submitted to the master. | |
However, During development or testing on a local machine, the resources | |
directory just needs to be on the classpath. It does not need to be contained | |
in the jar you create. | |
## The Protocol | |
Notes: | |
* Both ends of this protocol use a line-reading mechanism, so be sure to | |
trim off newlines from the input and to append them to your output. | |
* All inputs will be terminated by a single line contained "end". | |
* The bullet points below are written from the perspective of the script writer's | |
STDIN and STDOUT. | |
* Your script will be executed by the Bolt. | |
* STDIN: A string representing a path. This is a PID directory. | |
Your script should create an empty file named with it's pid in this directory. e.g. | |
the PID is 1234, so an empty file named 1234 is created in the directory. This is | |
file lets the supervisor know the PID, as it's only returned via this protocol for logging. | |
* STDOUT: Your PID. This is not JSON encoded, just a string. | |
* STDIN: (JSON) The Storm configuration. Various settings and properties. | |
* STDIN: (JSON) The Topology context | |
* STDIN: A tuple! This is a JSON encoded structure like this: | |
{ | |
// The tuple's id | |
"id": -6955786537413359385, | |
// The id of the component that created this tuple | |
"comp": 1, | |
// The id of the stream this tuple was emitted to | |
"stream": 1, | |
// The id of the task that created this tuple | |
"task": 9, | |
// All the values in this tuple | |
"tuple": ["snow white and the seven dwarfs"] | |
} | |
* STDOUT: The results of your bolt. XXX Is this JSON encoded? | |
* STDOUT: sync or end XXX which one and why? | |
### sync | |
Note: This command is not JSON encoded, it is sent as a simple string. | |
This lets the parent bolt know that the script has finished processing and | |
is ready for another tuple. | |
### end | |
Note: This command is not JSON encoded, it is sent as a simple string. | |
This should be sent after any of the commands below, as it delimits messages. | |
## Commands | |
Commands are JSON encoded instructions sent back from the script. | |
### ack | |
Acknowledge a tuple. Acking a tuple lets Storm know that you have processed it. | |
### emit | |
Emit | |
### fail | |
Fail a tuple. Failing a tuple will cause Storm to consider it unprocessed. | |
### log | |
The command allows you to send information back to Storm for logging. This | |
has nothing to do with tuple processing and is purely informational. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment