Enviar correo de GMAIL con Java

Posted on agosto 15, 2016 por admin

Este es otro post que quería compartir desde hace tiempo, entre otras cosas por su sencillez ya que en un proyecto anterior una solución de este tipo nos ayudó a resolver un problema puntual. Les pongo en contexto, debíamos desde nuestra empresa enviar una información que recabábamos diariamente, como es de suponer estábamos alejados fisicamente y en otra red y como no podía ser de otra forma no había permisos para por FTP o SSH hacer llegar la información, es por ello que dado las exigencias y con el poco tiempo que contábamos se nos ocurrió la idea «¿y si enviamos la información por email?» y así hicimos, construimos un pequeño programa Java el cual se ejecutaba periódicamente para enviar unos ficheros adjuntos que sacábamos diariamente.

Primero que nada es necesario que desde la cuenta de GMAIL habilitemos el acceso de aplicaciones menos seguras. Para ello es necesario que ingresemos a nuestra cuenta de GMAIL y una vez dentro vayamos a la parte superior derecha donde este el icono de nuestra cuenta, hacemos clic en el icono y posteriormente al botón «Mi cuenta». Allí veremos la siguiente imagen

mi cuenta de GMAIL

Como vemos en la imagen de arriba (resaltado en rojo), debemos hacer clic en el apartado de «Inicio de sesión y seguridad». Y allí nos desplazamos hasta la parte inferior y habilitamos/activamos la opción «Permitir el acceso de aplicaciones menos seguras», como en la imagen de abajo.

habilitar acceso de aplicaciones menos seguras

Hecho esto, vamos al programa, el cual hace uso de Java Mail. El programa es bastante sencillo consta de un fichero de configuración del siguiente tipo:

gmail.account=correo@gmail.com
gmail.password=password
emaildestinations=foo@domain1.com;bar@domain2.com;email@gmail.com
attachmentfiles=path_to_file1;path_to_file2

gmail.account=correo@gmail.com

gmail.password=password

emaildestinations=foo@domain1.com;bar@domain2.com;email@gmail.com

attachmentfiles=path_to_file1;path_to_file2

gmail.account: La cuenta de GMAIL desde la cual enviaremos el correo electrónico.

gmail.password: El password de la cuenta de GMAIL desde la cual enviaremos el correo electrónico.

emaildestinations: Lista de direcciones de correo electrónico separadas por punto y coma («;») a donde será enviado el correo.

attachmentfiles: Lista de rutas donde están ubicados los ficheros a adjuntar separados por punto y coma («;»).

A continuación el programa Java encargado del envío del correo electrónico.

package com.josedeveloper;

import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.util.Properties;

import javax.activation.DataHandler;
import javax.activation.DataSource;
import javax.activation.FileDataSource;
import javax.mail.BodyPart;
import javax.mail.Message;
import javax.mail.MessagingException;
import javax.mail.Multipart;
import javax.mail.PasswordAuthentication;
import javax.mail.Session;
import javax.mail.Transport;
import javax.mail.internet.InternetAddress;
import javax.mail.internet.MimeBodyPart;
import javax.mail.internet.MimeMessage;
import javax.mail.internet.MimeMultipart;

public class SendMail {

	public static void main(String[] args) throws IOException {
		
		Properties props = new Properties();
		props.put("mail.smtp.host", "smtp.gmail.com");
		props.put("mail.smtp.socketFactory.port", "465");
		props.put("mail.smtp.socketFactory.class", "javax.net.ssl.SSLSocketFactory");
		props.put("mail.smtp.auth", "true");
		props.put("mail.smtp.port", "465");
		
		String resourceName = "config.properties";
		ClassLoader loader = Thread.currentThread().getContextClassLoader();
		Properties config = new Properties();
		try(InputStream resourceStream = loader.getResourceAsStream(resourceName)) {
		    config.load(resourceStream);
		}

		final String gmailAccount = config.getProperty("gmail.account");
		final String gmailPassword = config.getProperty("gmail.password");
		final String[] emailDestinations = config.getProperty("emaildestinations").split(";");
		final String[] attachmentFiles = config.getProperty("attachmentfiles").split(";");

		Session session = Session.getDefaultInstance(props,
			new javax.mail.Authenticator() {
				protected PasswordAuthentication getPasswordAuthentication() {
					return new PasswordAuthentication(gmailAccount,gmailPassword);
				}
			});

		try {

			Message message = new MimeMessage(session);
			message.setFrom(new InternetAddress(gmailAccount));
			
			for (String emailDestination : emailDestinations) {
				message.addRecipients(Message.RecipientType.TO, InternetAddress.parse(emailDestination));
			}

			message.setSubject("Email Subject - Asunto del correo electronico");

			BodyPart messageBodyPart = new MimeBodyPart();
			messageBodyPart.setText("Email text Body - Texto o cuerpo del correo electronico");
			
			Multipart multipart = new MimeMultipart();
			for (String attachmentFile : attachmentFiles) {
				addAttachment(multipart, attachmentFile);
			}
			
			//Setting email text message
			multipart.addBodyPart(messageBodyPart);

			//set the attachments to the email
	        message.setContent(multipart);

			Transport.send(message);

			System.out.println("Correo enviado");

		} catch (MessagingException e) {
			throw new RuntimeException(e);
		}

	}
	
	private static void addAttachment(Multipart multipart, String filePath) throws MessagingException
	{
		File file = new File(filePath);
	    DataSource source = new FileDataSource(file);
	    BodyPart messageBodyPart = new MimeBodyPart();        
	    messageBodyPart.setDataHandler(new DataHandler(source));
	    messageBodyPart.setFileName(file.getName());
	    multipart.addBodyPart(messageBodyPart);
	}

}

package com.josedeveloper;

import java.io.File;

import java.io.IOException;

import java.io.InputStream;

import java.util.Properties;

import javax.activation.DataHandler;

import javax.activation.DataSource;

import javax.activation.FileDataSource;

import javax.mail.BodyPart;

import javax.mail.Message;

import javax.mail.MessagingException;

import javax.mail.Multipart;

import javax.mail.PasswordAuthentication;

import javax.mail.Session;

import javax.mail.Transport;

import javax.mail.internet.InternetAddress;

import javax.mail.internet.MimeBodyPart;

import javax.mail.internet.MimeMessage;

import javax.mail.internet.MimeMultipart;

public class SendMail {

public static void main(String[] args) throws IOException {

Properties props = new Properties();

props.put("mail.smtp.host", "smtp.gmail.com");

props.put("mail.smtp.socketFactory.port", "465");

props.put("mail.smtp.socketFactory.class", "javax.net.ssl.SSLSocketFactory");

props.put("mail.smtp.auth", "true");

props.put("mail.smtp.port", "465");

String resourceName = "config.properties";

ClassLoader loader = Thread.currentThread().getContextClassLoader();

Properties config = new Properties();

try(InputStream resourceStream = loader.getResourceAsStream(resourceName)) {

config.load(resourceStream);

}

final String gmailAccount = config.getProperty("gmail.account");

final String gmailPassword = config.getProperty("gmail.password");

final String[] emailDestinations = config.getProperty("emaildestinations").split(";");

final String[] attachmentFiles = config.getProperty("attachmentfiles").split(";");

Session session = Session.getDefaultInstance(props,

new javax.mail.Authenticator() {

protected PasswordAuthentication getPasswordAuthentication() {

return new PasswordAuthentication(gmailAccount,gmailPassword);

}

});

try {

Message message = new MimeMessage(session);

message.setFrom(new InternetAddress(gmailAccount));

for (String emailDestination : emailDestinations) {

message.addRecipients(Message.RecipientType.TO, InternetAddress.parse(emailDestination));

}

message.setSubject("Email Subject - Asunto del correo electronico");

BodyPart messageBodyPart = new MimeBodyPart();

messageBodyPart.setText("Email text Body - Texto o cuerpo del correo electronico");

Multipart multipart = new MimeMultipart();

for (String attachmentFile : attachmentFiles) {

addAttachment(multipart, attachmentFile);

}

//Setting email text message

multipart.addBodyPart(messageBodyPart);

//set the attachments to the email

message.setContent(multipart);

Transport.send(message);

System.out.println("Correo enviado");

} catch (MessagingException e) {

throw new RuntimeException(e);

}

private static void addAttachment(Multipart multipart, String filePath) throws MessagingException

{

File file = new File(filePath);

DataSource source = new FileDataSource(file);

BodyPart messageBodyPart = new MimeBodyPart();

messageBodyPart.setDataHandler(new DataHandler(source));

messageBodyPart.setFileName(file.getName());

multipart.addBodyPart(messageBodyPart);

}

El ejemplo es bastante sencillo, pero nos muestra como indicar el asunto del email, el texto del mensaje así como adjuntar ficheros. Espero que les pueda ser de utilidad. Aquí el enlace al proyecto en Github.

Primeros pasos con Apache Spark 2

Posted on agosto 10, 2016 por admin

Responder

Hace pocos días salió la esperada versión 2 de Apache Spark y como algunos de ustedes saben es un framework que ahora mismo atrae mucho mi atención y como no pudo ser de otra forma hice un pequeño proyecto donde quiero ir colocando ejemplos sencillos de Spark con las nuevas (y no tan nuevas) cosas de Spark.

Para empezar comentarles que yo todavía no he utilizado sbt sino por el contrario uso maven como herramienta de construcción de proyectos. He aquí los primeros cambios necesarios para trabajar con spark 2, las dependencias correspondientes a la versión (indicadas en el pom.xml).

   <dependencies>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.10</artifactId>
      <version>2.0.0</version>
    </dependency>

    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.10</artifactId>
      <version>2.0.0</version>
    </dependency>
    
   ...

<groupId>org.apache.spark</groupId>

<artifactId>spark-core_2.10</artifactId>

</dependency>

<groupId>org.apache.spark</groupId>

<artifactId>spark-sql_2.10</artifactId>

</dependency>

...

Entre los nuevos cambios de spark está que el punto de entrada para los programas spark ya no serán el hiveContext o sqlContext sino que han sido subsumidas en una clase llamada SparkSession. Las clases HiveContext y SQLContext se han mantenido para proporcionar retrocompatibilidad. Ejemplo

// En spark v1.6.2 &lt;= lo hubieramos hecho asi
  //  val conf = new SparkConf().setMaster("local").setAppName("Simple Application")
  //  val sc = new SparkContext(conf)
  //  val sqlContext = new org.apache.spark.sql.SQLContext(sc)
  val sparkSession = SparkSession.builder.
    master("local")
    .appName("Simple Application")
    .getOrCreate()

// En spark v1.6.2 <= lo hubieramos hecho asi

// val conf = new SparkConf().setMaster("local").setAppName("Simple Application")

// val sc = new SparkContext(conf)

// val sqlContext = new org.apache.spark.sql.SQLContext(sc)

val sparkSession = SparkSession.builder.

master("local")

.appName("Simple Application")

.getOrCreate()

Con el SparkSession haremos lo mismo que hacíamos con sqlContext por ejemplo obtener un Dataset

  //obtenemos el dataset de tipo People
  val ds = sparkSession.read.json("src/main/resources/people.json").as[People]

1 2	//obtenemos el dataset de tipo People val ds = sparkSession.read.json("src/main/resources/people.json").as[People]

O por el contrario obtener un DataFrame

val df = sparkSession.read.format("com.databricks.spark.csv")
    .format("com.databricks.spark.csv")
    .option("header", "true") // Use first line of all files as header
    .option("inferSchema", "true") // Automatically infer data types
    .load("src/main/resources/datos.csv")

val df = sparkSession.read.format("com.databricks.spark.csv")

.format("com.databricks.spark.csv")

.option("header", "true") // Use first line of all files as header

.option("inferSchema", "true") // Automatically infer data types

.load("src/main/resources/datos.csv")

Otro punto importante ha sido la unificación de las clases Dataset y DataFrame (para Java y Scala) a partir de la versión 2.¿Qué significa esto? pues sencillamente que ahora solo existirá la clase Dataset, pero proporcionará la misma funcionalidad que nos daba la clase DataFrame, de hecho basta con comparar la API en la versión 1.6.2 y 2.0.0 y ver como los métodos de la clase DataFrame están ahora incluidos en la clase Dataset.

Dataset y Dataframe en Spark 2

Aquellos interesados en leer más acerca de Dataset y Dataframe visitar este link

Estos no son los únicos cambios en Spark, de hechos son muchos más, que se corresponden a optimizaciones a nivel de compilación y ejecución así como también a un nuevo parseador SQL, para leer mas acerca de lo nuevo en Spark 2 clic aquí.

Aqui les dejo en enlace al proyecto donde ire añadiendo clases y seguiré probando mas cosas nuevas de Spark.

Representación gráfica de mi cuenta de Twitter

Posted on julio 15, 2016 por admin

Hola de nuevo chicos, este es un post breve pero que quería compartir con ustedes desde hace mucho tiempo, fue hace un año aproximadamente que en la materia de análisis de redes sociales realicé un mini proyecto, este consistió en analizar la actividad de mi red social de twitter, y además tener una representación gráfica de la actividad de mi cuenta, es decir, a quienes sigo y las menciones y hashtags utilizados por mí y por aquellos a quienes sigo, incluso el número de veces que se ha utilizado cada hashtag.

Lo único que he hecho antes de subir el código a github fue actualizar la versión neo4j a 2.3.6 (base de datos de grafo donde se guardan las relaciones) y eliminar mis datos para el uso de la API de twitter.

Antes de ejecutar esta aplicación será necesario que generen un token y sigan los pasos necesarios para poder utilizar la API de twitter, por otro lado es importante destacar que este programa almacena las distintas relaciones entre entidades en una base de datos embebida de neo4j y para poder visualizar el resultado final de todas esas relaciones guardadas, lo que hice fue simplemente utilizar el navegador/visualizador por defecto que trae neo4j (que si no me equivoco esta creado con d3.js). Así que manos a la obra y comencemos descargando e instalando la versión 2.3.6 de Neo4j desde el siguiente enlace y una vez realizada la instalación pasamos al código fuente.

A continuación la clase principal:

package com.josedeveloper.twitter;

import java.io.File;
import java.io.IOException;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Map;
import java.util.Set;

import javax.json.Json;
import javax.json.JsonArray;
import javax.json.JsonObject;
import javax.json.JsonReader;
import javax.json.JsonValue;

import oauth.signpost.OAuthConsumer;
import oauth.signpost.commonshttp.CommonsHttpOAuthConsumer;
import oauth.signpost.exception.OAuthCommunicationException;
import oauth.signpost.exception.OAuthExpectationFailedException;
import oauth.signpost.exception.OAuthMessageSignerException;

import org.apache.http.HttpResponse;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.HttpClientBuilder;
import org.neo4j.graphdb.GraphDatabaseService;
import org.neo4j.graphdb.Label;
import org.neo4j.graphdb.Node;
import org.neo4j.graphdb.Relationship;
import org.neo4j.graphdb.RelationshipType;
import org.neo4j.graphdb.Transaction;
import org.neo4j.graphdb.factory.GraphDatabaseFactory;

public class TwitterRelationshipApp {
	
	static final String TWITTER_DB_PATH = "DATABASE_PATH";
	static final String CONSUMER_KEY = "YOUR_CONSUMER_KEY";
	static final String CONSUMER_SECRET = "YOUR_CONSUMER_SECRET";
	static final String ACCESS_TOKEN = "YOUR_ACCESS_TOKEN";
	static final String ACCESS_TOKEN_SECRET = "YOUR_ACCESS_TOKEN_SECRET";
	
	private final GraphDatabaseService graphDB;
	private final Set totalUsers;
	private final String account;
	private final int count;
	
	enum NodeType implements Label {
		TWITTER_USER, HASHTAG;
	}
	
	enum Relationships implements RelationshipType {
		USE, MENTION;
	}
	
	public TwitterRelationshipApp(final String account, final int count) {
		this.account = account;
		this.count = count;
		
		totalUsers = new HashSet&lt;&gt;();
		graphDB = new GraphDatabaseFactory().newEmbeddedDatabase(new File(TWITTER_DB_PATH));
	}
	
    public static void main(String[] args) throws OAuthMessageSignerException, OAuthExpectationFailedException, OAuthCommunicationException, ClientProtocolException, IOException
    {	
    	TwitterRelationshipApp app = new TwitterRelationshipApp("josedeveloper", 100);
    	app.registerShutdownHook();
    	app.insertUsers();
    	app.insertUserMentionsRelationshipsByUser();
    }

	private void insertUsers() throws OAuthMessageSignerException, OAuthExpectationFailedException, OAuthCommunicationException, ClientProtocolException, IOException {
    	OAuthConsumer oAuthConsumer = new CommonsHttpOAuthConsumer(CONSUMER_KEY, CONSUMER_SECRET);
		oAuthConsumer.setTokenWithSecret(ACCESS_TOKEN, ACCESS_TOKEN_SECRET);

		HttpGet httpGet = new HttpGet("https://api.twitter.com/1.1/friends/list.json?screen_name=" + account + "&amp;count=" + count); //those who I follow
		
		oAuthConsumer.sign(httpGet);

		HttpClient httpClient = HttpClientBuilder.create().build();
		HttpResponse httpResponse = httpClient.execute(httpGet);

		//int statusCode = httpResponse.getStatusLine().getStatusCode();
		
		JsonReader reader = Json.createReader(httpResponse.getEntity().getContent());
		JsonObject root = reader.readObject();
		JsonArray users = root.getJsonArray("users");
		
		Iterator iter = users.iterator();
		while (iter.hasNext()) {
			JsonObject user = (JsonObject) iter.next();
			
			try (Transaction tx = graphDB.beginTx()) {
				Node userNode = graphDB.createNode(NodeType.TWITTER_USER);
				
				userNode.setProperty("id", user.getString("id_str"));
				userNode.setProperty("name", user.getString("name"));
				userNode.setProperty("screen_name", user.getString("screen_name"));
				
				insertRelationshipsWithHashtagsByUser(userNode, graphDB);
				
				tx.success();
			} catch (Exception e) {
				System.out.println(e);
			}
			
			totalUsers.add(user.getString("screen_name"));
		}
	}
		
	private void registerShutdownHook() {
		
		
		    // Registers a shutdown hook for the Neo4j instance so that it
		    // shuts down nicely when the VM exits (even if you "Ctrl-C" the
		    // running application).
		Runtime.getRuntime().addShutdownHook(new Thread() {
			@Override
		    public void run() {
				graphDB.shutdown();
		    }
			
		});
	}
	
	
	private static void insertRelationshipsWithHashtagsByUser(Node user, final GraphDatabaseService db) throws OAuthMessageSignerException, OAuthExpectationFailedException, OAuthCommunicationException, ClientProtocolException, IOException {
		OAuthConsumer oAuthConsumer = new CommonsHttpOAuthConsumer(CONSUMER_KEY, CONSUMER_SECRET);
		oAuthConsumer.setTokenWithSecret(ACCESS_TOKEN, ACCESS_TOKEN_SECRET);
		
		HttpGet httpGet = new HttpGet("https://api.twitter.com/1.1/statuses/user_timeline.json?screen_name=" + user.getProperty("screen_name"));
		oAuthConsumer.sign(httpGet);
		
		HttpClient httpClient = HttpClientBuilder.create().build();
		HttpResponse httpResponse = httpClient.execute(httpGet);
		
		//int statusCode = httpResponse.getStatusLine().getStatusCode();
		
		JsonReader timelineReader = Json.createReader(httpResponse.getEntity().getContent());
		JsonArray tweets = timelineReader.readArray();
		Iterator tweetsIter = tweets.iterator();
		Map&lt;String, Integer&gt; usedHashtags = new HashMap&lt;&gt;();
		while(tweetsIter.hasNext()) {
			JsonObject tweet = (JsonObject) tweetsIter.next();
			
			JsonObject entities = tweet.getJsonObject("entities");
			JsonArray hashtags = entities.getJsonArray("hashtags");
			Iterator hashtagsIter = hashtags.iterator();
			
			while (hashtagsIter.hasNext()) {
				String hashtag = ((JsonObject) hashtagsIter.next()).getString("text");
				
				if (usedHashtags.containsKey(hashtag)) {
					Integer counter = usedHashtags.get(hashtag);
					usedHashtags.put(hashtag, ++counter);
				} else{
					usedHashtags.put(hashtag, Integer.valueOf(1));
				}
			}
			
		}
		
		for (String hashtag : usedHashtags.keySet()) {
			
			try (Transaction tx = db.beginTx()) {
				Node hashtagNode = db.findNode(NodeType.HASHTAG, "text", hashtag);
				if (hashtagNode == null)
					hashtagNode = db.createNode(NodeType.HASHTAG);				
				
				hashtagNode.setProperty("text", hashtag);
				
				Integer timesUsed = usedHashtags.get(hashtag);
				Relationship use = user.createRelationshipTo(hashtagNode, Relationships.USE);
				use.setProperty("times", timesUsed);

				tx.success();
			} catch (Exception e) {
				System.out.println(e);
			}
		}
		
		
	}
	
	private void insertUserMentionsRelationshipsByUser() throws OAuthMessageSignerException, OAuthExpectationFailedException, OAuthCommunicationException, ClientProtocolException, IOException {
		OAuthConsumer oAuthConsumer = new CommonsHttpOAuthConsumer(CONSUMER_KEY, CONSUMER_SECRET);
		oAuthConsumer.setTokenWithSecret(ACCESS_TOKEN, ACCESS_TOKEN_SECRET);
		
		for (String twitterUser : totalUsers) {
			
			HttpGet httpGet = new HttpGet("https://api.twitter.com/1.1/statuses/user_timeline.json?screen_name=" + twitterUser);
			oAuthConsumer.sign(httpGet);
			
			HttpClient httpClient = HttpClientBuilder.create().build();
			HttpResponse httpResponse = httpClient.execute(httpGet);
			
			//int statusCode = httpResponse.getStatusLine().getStatusCode();
			
			JsonReader timelineReader = Json.createReader(httpResponse.getEntity().getContent());
			JsonArray tweets = timelineReader.readArray();
			Iterator tweetsIter = tweets.iterator();
			Map&lt;String, Integer&gt; userMentionsDone = new HashMap&lt;&gt;();
			while(tweetsIter.hasNext()) {
				JsonObject tweet = (JsonObject) tweetsIter.next();
				
				JsonObject entities = tweet.getJsonObject("entities");
				JsonArray userMentions = entities.getJsonArray("user_mentions");
				Iterator hashtagsIter = userMentions.iterator();
				
				while (hashtagsIter.hasNext()) {
					String userMentioned = ((JsonObject) hashtagsIter.next()).getString("screen_name");
					
					if (totalUsers.contains(userMentioned)) {
						if (userMentionsDone.containsKey(userMentioned)) {
							Integer counter = userMentionsDone.get(userMentioned);
							userMentionsDone.put(userMentioned, ++counter);
						} else{
							userMentionsDone.put(userMentioned, Integer.valueOf(1));
						}
					}
				}
				
			}
			
			for (String userMentionDone : userMentionsDone.keySet()) {
				
				try (Transaction tx = graphDB.beginTx()) {
					Node twitterUserMentionedNode = graphDB.findNode(NodeType.TWITTER_USER, "screen_name", userMentionDone);
					Node twitterUserNode = graphDB.findNode(NodeType.TWITTER_USER, "screen_name", twitterUser);
					
					Integer timesMentioned = userMentionsDone.get(userMentionDone);
					Relationship use = twitterUserNode.createRelationshipTo(twitterUserMentionedNode, Relationships.MENTION);
					use.setProperty("times", timesMentioned);

					tx.success();
				} catch (Exception e) {
					System.out.println(e);
				}
			}
		}
		
	}
	
}

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

package com.josedeveloper.twitter;

import java.io.File;

import java.io.IOException;

import java.util.HashMap;

import java.util.HashSet;

import java.util.Iterator;

import java.util.Map;

import java.util.Set;

import javax.json.Json;

import javax.json.JsonArray;

import javax.json.JsonObject;

import javax.json.JsonReader;

import javax.json.JsonValue;

import oauth.signpost.OAuthConsumer;

import oauth.signpost.commonshttp.CommonsHttpOAuthConsumer;

import oauth.signpost.exception.OAuthCommunicationException;

import oauth.signpost.exception.OAuthExpectationFailedException;

import oauth.signpost.exception.OAuthMessageSignerException;

import org.apache.http.HttpResponse;

import org.apache.http.client.ClientProtocolException;

import org.apache.http.client.HttpClient;

import org.apache.http.client.methods.HttpGet;

import org.apache.http.impl.client.HttpClientBuilder;

import org.neo4j.graphdb.GraphDatabaseService;

import org.neo4j.graphdb.Label;

import org.neo4j.graphdb.Node;

import org.neo4j.graphdb.Relationship;

import org.neo4j.graphdb.RelationshipType;

import org.neo4j.graphdb.Transaction;

import org.neo4j.graphdb.factory.GraphDatabaseFactory;

public class TwitterRelationshipApp {

static final String TWITTER_DB_PATH = "DATABASE_PATH";

static final String CONSUMER_KEY = "YOUR_CONSUMER_KEY";

static final String CONSUMER_SECRET = "YOUR_CONSUMER_SECRET";

static final String ACCESS_TOKEN = "YOUR_ACCESS_TOKEN";

static final String ACCESS_TOKEN_SECRET = "YOUR_ACCESS_TOKEN_SECRET";

private final GraphDatabaseService graphDB;

private final Set totalUsers;

private final String account;

private final int count;

enum NodeType implements Label {

TWITTER_USER, HASHTAG;

}

enum Relationships implements RelationshipType {

USE, MENTION;

}

public TwitterRelationshipApp(final String account, final int count) {

this.account = account;

this.count = count;

totalUsers = new HashSet<>();

graphDB = new GraphDatabaseFactory().newEmbeddedDatabase(new File(TWITTER_DB_PATH));

}

public static void main(String[] args) throws OAuthMessageSignerException, OAuthExpectationFailedException, OAuthCommunicationException, ClientProtocolException, IOException

{

TwitterRelationshipApp app = new TwitterRelationshipApp("josedeveloper", 100);

app.registerShutdownHook();

app.insertUsers();

app.insertUserMentionsRelationshipsByUser();

}

private void insertUsers() throws OAuthMessageSignerException, OAuthExpectationFailedException, OAuthCommunicationException, ClientProtocolException, IOException {

OAuthConsumer oAuthConsumer = new CommonsHttpOAuthConsumer(CONSUMER_KEY, CONSUMER_SECRET);

oAuthConsumer.setTokenWithSecret(ACCESS_TOKEN, ACCESS_TOKEN_SECRET);

HttpGet httpGet = new HttpGet("https://api.twitter.com/1.1/friends/list.json?screen_name=" + account + "&count=" + count); //those who I follow

oAuthConsumer.sign(httpGet);

HttpClient httpClient = HttpClientBuilder.create().build();

HttpResponse httpResponse = httpClient.execute(httpGet);

//int statusCode = httpResponse.getStatusLine().getStatusCode();

JsonReader reader = Json.createReader(httpResponse.getEntity().getContent());

JsonObject root = reader.readObject();

JsonArray users = root.getJsonArray("users");

Iterator iter = users.iterator();

while (iter.hasNext()) {

JsonObject user = (JsonObject) iter.next();

try (Transaction tx = graphDB.beginTx()) {

Node userNode = graphDB.createNode(NodeType.TWITTER_USER);

userNode.setProperty("id", user.getString("id_str"));

userNode.setProperty("name", user.getString("name"));

userNode.setProperty("screen_name", user.getString("screen_name"));

insertRelationshipsWithHashtagsByUser(userNode, graphDB);

tx.success();

} catch (Exception e) {

System.out.println(e);

}

totalUsers.add(user.getString("screen_name"));

}

private void registerShutdownHook() {

// Registers a shutdown hook for the Neo4j instance so that it

// shuts down nicely when the VM exits (even if you "Ctrl-C" the

// running application).

Runtime.getRuntime().addShutdownHook(new Thread() {

@Override

public void run() {

graphDB.shutdown();

}

});

}

private static void insertRelationshipsWithHashtagsByUser(Node user, final GraphDatabaseService db) throws OAuthMessageSignerException, OAuthExpectationFailedException, OAuthCommunicationException, ClientProtocolException, IOException {

OAuthConsumer oAuthConsumer = new CommonsHttpOAuthConsumer(CONSUMER_KEY, CONSUMER_SECRET);

oAuthConsumer.setTokenWithSecret(ACCESS_TOKEN, ACCESS_TOKEN_SECRET);

HttpGet httpGet = new HttpGet("https://api.twitter.com/1.1/statuses/user_timeline.json?screen_name=" + user.getProperty("screen_name"));

oAuthConsumer.sign(httpGet);

HttpClient httpClient = HttpClientBuilder.create().build();

HttpResponse httpResponse = httpClient.execute(httpGet);

//int statusCode = httpResponse.getStatusLine().getStatusCode();

JsonReader timelineReader = Json.createReader(httpResponse.getEntity().getContent());

JsonArray tweets = timelineReader.readArray();

Iterator tweetsIter = tweets.iterator();

Map<String, Integer> usedHashtags = new HashMap<>();

while(tweetsIter.hasNext()) {

JsonObject tweet = (JsonObject) tweetsIter.next();

JsonObject entities = tweet.getJsonObject("entities");

JsonArray hashtags = entities.getJsonArray("hashtags");

Iterator hashtagsIter = hashtags.iterator();

while (hashtagsIter.hasNext()) {

String hashtag = ((JsonObject) hashtagsIter.next()).getString("text");

if (usedHashtags.containsKey(hashtag)) {

Integer counter = usedHashtags.get(hashtag);

usedHashtags.put(hashtag, ++counter);

} else{

usedHashtags.put(hashtag, Integer.valueOf(1));

}

for (String hashtag : usedHashtags.keySet()) {

try (Transaction tx = db.beginTx()) {

Node hashtagNode = db.findNode(NodeType.HASHTAG, "text", hashtag);

if (hashtagNode == null)

hashtagNode = db.createNode(NodeType.HASHTAG);

hashtagNode.setProperty("text", hashtag);

Integer timesUsed = usedHashtags.get(hashtag);

Relationship use = user.createRelationshipTo(hashtagNode, Relationships.USE);

use.setProperty("times", timesUsed);

tx.success();

} catch (Exception e) {

System.out.println(e);

}

private void insertUserMentionsRelationshipsByUser() throws OAuthMessageSignerException, OAuthExpectationFailedException, OAuthCommunicationException, ClientProtocolException, IOException {

OAuthConsumer oAuthConsumer = new CommonsHttpOAuthConsumer(CONSUMER_KEY, CONSUMER_SECRET);

oAuthConsumer.setTokenWithSecret(ACCESS_TOKEN, ACCESS_TOKEN_SECRET);

for (String twitterUser : totalUsers) {

HttpGet httpGet = new HttpGet("https://api.twitter.com/1.1/statuses/user_timeline.json?screen_name=" + twitterUser);

oAuthConsumer.sign(httpGet);

HttpClient httpClient = HttpClientBuilder.create().build();

HttpResponse httpResponse = httpClient.execute(httpGet);

//int statusCode = httpResponse.getStatusLine().getStatusCode();

JsonReader timelineReader = Json.createReader(httpResponse.getEntity().getContent());

JsonArray tweets = timelineReader.readArray();

Iterator tweetsIter = tweets.iterator();

Map<String, Integer> userMentionsDone = new HashMap<>();

while(tweetsIter.hasNext()) {

JsonObject tweet = (JsonObject) tweetsIter.next();

JsonObject entities = tweet.getJsonObject("entities");

JsonArray userMentions = entities.getJsonArray("user_mentions");

Iterator hashtagsIter = userMentions.iterator();

while (hashtagsIter.hasNext()) {

String userMentioned = ((JsonObject) hashtagsIter.next()).getString("screen_name");

if (totalUsers.contains(userMentioned)) {

if (userMentionsDone.containsKey(userMentioned)) {

Integer counter = userMentionsDone.get(userMentioned);

userMentionsDone.put(userMentioned, ++counter);

} else{

userMentionsDone.put(userMentioned, Integer.valueOf(1));

}

for (String userMentionDone : userMentionsDone.keySet()) {

try (Transaction tx = graphDB.beginTx()) {

Node twitterUserMentionedNode = graphDB.findNode(NodeType.TWITTER_USER, "screen_name", userMentionDone);

Node twitterUserNode = graphDB.findNode(NodeType.TWITTER_USER, "screen_name", twitterUser);

Integer timesMentioned = userMentionsDone.get(userMentionDone);

Relationship use = twitterUserNode.createRelationshipTo(twitterUserMentionedNode, Relationships.MENTION);

use.setProperty("times", timesMentioned);

tx.success();

} catch (Exception e) {

System.out.println(e);

}

El código completo del proyecto pueden hallarlo en el siguiente enlace. Una vez hayamos ejecutado nuestra aplicación, en la ruta especificada TWITTER_DB_PATH encontraremos una carpeta con extensión .db donde estarán almacenadas las relaciones (todo el grafo), el siguiente paso para poder visualizar el grafo será editar el fichero RUTA_INSTALACION_NEO4J/conf/neo4j-server.properties y editar la ruta donde ha de estar ubicada la base de datos

org.neo4j.server.database.location=TWITTER_DB_PATH.db

TWITTER_DB_PATH= La ruta especificada donde se ha de crear la base de datos Neo4j donde se almacenarán las relaciones.

Ahora procedemos a arrancar la base de datos, que es bastante sencillo solo es necesario ejecutar el siguiente comando:

RUTA_INSTALACION_NEO4J/bin/neo4j start

Hecho esto desde un navegador (chrome o firefox por ejemplo) ir a la ruta http://localhost:7474 y veremos el cliente web de Neo4j

neo4j

Luego desde la consola donde ejecutar las consultas (donde aparece el símbolo del $) ejecutar el siguiente comando para poder visualizar todo el grafo.

MATCH (n) RETURN n

En mi caso obtuve lo siguiente

relaciones neo4j

Como podrán darse cuenta se pueden distinguir los distintos tipos de relaciones (USE y MENTION) entre los distintos nodos, además hay 2 tipos de nodos, los azules son las cuentas de twitter y los verdes son los hashtags.

Otra cosa interesante es que con el visualizador de Neo4j podemos ver los datos de las relaciones como por ejemplo el número de veces que una cuenta de twitter ha usado un hashtag o mencionado a otra cuenta como en la siguiente gráfica

Número de veces que un hashtag es utilizado

Como se puede apreciar en la parte inferior de la gráfica, la cuenta Ben & Martijn ha utilizado 2 veces el hashtag #Java.

Bueno ya no me queda más nada que mostrar a este respecto, así que si te resulta interesante ejecuta este ejemplo y ve como es la actividad de tu cuenta de twitter y te aseguro que encontraras cosas que te llamarán la atención y si te parece compártelas con el resto.

Por último mencionarles que intentare en medida de lo posible actualizar este código para que trabaje con la versión 3 o superior de Neo4j, mejorar el código (hacerlo más claro) y actualizarlo a Java 8. Cualquier comentario y/o sugerencia soy todo oídos.

Primer ejemplo con apache Storm

Posted on julio 10, 2016 por admin

Apache Storm es un framework de procesamiento distribuido de eventos. Empresas como twitter utilizaron Storm desde el 2011 aunque posteriormente lo reemplazó por Heron en el 2015.

Actualmente me encuentro trabajando en una aplicación construida con Apache Storm y quiero compartir con ustedes mi primer ejemplo con Apache Storm el cual es bastante simple pero cumplio su cometido que era el iniciarme en este framework y entender sus componentes principales.

Necesitaremos descargar rabbitMQ la versión 3.5.7.

Una vez hayamos instalado rabbitMQ (es solo cuestión de descomprimir) vamos a habilitar un plugin (un cliente web) donde podremos de forma sencilla monitorizar las colas, entonces nos ubicamos en la siguiente ruta ruta_instalacion_rabbitmq/sbin/ y desde allí ejecutamos el siguiente comando

./rabbitmq-plugins enable rabbitmq_management

Este comando habilitará el cliente web que nos permitirá monitorizar las colas, los exchanges incluso manipular las colas pudiendo ingresar elementos a las colas, sacar elementos e incluso purgar las colas. Paso siguiente iniciaremos el rabbitMQ con el siguiente comando:

./rabbitmq-server

Inmediatamente después desde un navegador nos dirigimos a la dirección http://localhost:15672

rabbitmq overview

Una vez allí crearemos una cola para hacer nuestro ejemplo y la llamaremos «data».

agregar cola en rabbitmq

Ahora deberíamos ser capaces de ver la única cola de nuestro sistema. Clicando en ella podríamos incluso agregarle mensajes por medio de la interfaz gráfica (si lo desean hagan la prueba y verán como cada uno de los mensajes que agreguen se irán encolando), pero la inserción de mensajes en la cola lo haremos mediante un pequeño programa Java.

cola data

Ahora un poco de teoría para conocer acerca de Apache Storm.

¿Qué es Apache Storm?

Es un framework de computación distribuida en tiempo real, escrito en su mayoría en Clojure. Storm es similar a la forma como Hadoop ofrece un conjunto de primitivas generales para hacer el procesamiento por lotes, también ofrece un conjunto de primitivas generales para hacer cómputos en tiempo real. Storm es simple, se puede utilizar con cualquier lenguaje de programación.

Las aplicaciones de Storm son creadas como topologías en la forma de DAG (Directed Acyclic Graph) con spouts y bolts actuando como los vertices del grado. Las aristas en el grafo son llamados streams y dirigen la data de un nodo a otro. Juntos, la topología actúa como una tubería de transformación de datos.

Los spouts son fuentes de flujo (streams) en una topología. Los spouts generalmente leerán tuplas desde una fuente externa y las emiten dentro de la topología.

Un bolt es donde se realiza todo el procesamiento de una topología, pueden hacer cualquier cosa, desde filtrado, funciones, agregaciones, joins, comunicarse con bases de datos y mucho más.

El ejemplo que hice y compartiré a continuación con ustedes será bastante simple, estará constituido por un spout que leerá de una cola rabbitmq (la cola data que creamos anteriormente) y ese mensaje lo insertará en la topología para posteriormente al recibirlo el bolt mostrarlo por linea de comandos (ya luego si ustedes lo desean lo que podrían hacer es que en vez de mostrarlo por linea de comandos volcar ese mensaje en otra cola de rabbitmq).

El programa java que se encarga de insertar mensajes a la cola

package com.josedeveloper.rabbitmq;

import java.io.IOException;
import java.util.concurrent.TimeoutException;

import com.rabbitmq.client.Channel;
import com.rabbitmq.client.Connection;
import com.rabbitmq.client.ConnectionFactory;

public class SendMessagesToRabbit {
    
    public static final String message = "MENSAJE DE PRUEBA";
    public static final int NUM_MENSAJES = 10000;
    
    public static void main(String[] args) throws IOException, InterruptedException, TimeoutException {	
		sendMessage();
    }
    
    public static void sendMessage() throws IOException, InterruptedException, TimeoutException {
    	ConnectionFactory factory = new ConnectionFactory();
	    factory.setHost("localhost");
	    factory.setPort(5672);
	    factory.setUsername("guest"); //usuario por defecto de rabbitmq
	    factory.setPassword("guest"); //password por defecto de rabbitmq
	    factory.setVirtualHost("/");

	    Connection connection = factory.newConnection();
		Channel channel = connection.createChannel();
		
		for (int i = 0; i &lt; NUM_MENSAJES; i++) {
			final String msg = message + i;
			channel.basicPublish("", "data", null, msg.getBytes());
		}
			

		channel.close();
	    connection.close();
    }
 
}

package com.josedeveloper.rabbitmq;

import java.io.IOException;

import java.util.concurrent.TimeoutException;

import com.rabbitmq.client.Channel;

import com.rabbitmq.client.Connection;

import com.rabbitmq.client.ConnectionFactory;

public class SendMessagesToRabbit {

public static final String message = "MENSAJE DE PRUEBA";

public static final int NUM_MENSAJES = 10000;

public static void main(String[] args) throws IOException, InterruptedException, TimeoutException {

sendMessage();

}

public static void sendMessage() throws IOException, InterruptedException, TimeoutException {

ConnectionFactory factory = new ConnectionFactory();

factory.setHost("localhost");

factory.setPort(5672);

factory.setUsername("guest"); //usuario por defecto de rabbitmq

factory.setPassword("guest"); //password por defecto de rabbitmq

factory.setVirtualHost("/");

Connection connection = factory.newConnection();

Channel channel = connection.createChannel();

for (int i = 0; i < NUM_MENSAJES; i++) {

final String msg = message + i;

channel.basicPublish("", "data", null, msg.getBytes());

}

channel.close();

connection.close();

}

El Spout que leerá de la cola rabbitMQ

package com.josedeveloper.topologia;

import java.io.IOException;
import java.util.Map;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.TimeoutException;

import org.apache.storm.spout.SpoutOutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseRichSpout;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Values;

import com.rabbitmq.client.AMQP;
import com.rabbitmq.client.Channel;
import com.rabbitmq.client.Connection;
import com.rabbitmq.client.ConnectionFactory;
import com.rabbitmq.client.Consumer;
import com.rabbitmq.client.DefaultConsumer;
import com.rabbitmq.client.Envelope;

public class RabbitMQSpout extends BaseRichSpout {

	private static final long serialVersionUID = -5875062340173997062L;
	
	private SpoutOutputCollector collector;
	BlockingQueue messages;
	
	private final static String QUEUE_NAME = "data";
	
	@Override
	public void nextTuple() {
		String message;
        while ((message = messages.poll()) != null) {
        	collector.emit(new Values(message)); //emitimos el mensaje dentro de la topologia
        }
		
	}

	@Override
	public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
		this.collector = collector;
		messages = new ArrayBlockingQueue(100);
		ConnectionFactory factory = new ConnectionFactory();
	    factory.setHost("localhost");
	    Connection connection;
		
	    try {
			connection = factory.newConnection();
			Channel channel = connection.createChannel();
			
			channel.queueDeclare(QUEUE_NAME, true, false, false, null);
			
			Consumer consumer = new DefaultConsumer(channel) {
			      @Override
			      public void handleDelivery(String consumerTag, Envelope envelope, AMQP.BasicProperties properties, byte[] body)
			          throws IOException {
			        String message = new String(body, "UTF-8");
			        try {
						messages.put(message);
					} catch (InterruptedException e) {
						e.printStackTrace();
					}
			      }
			    };
		    channel.basicConsume(QUEUE_NAME, true, consumer);
		} catch (IOException e) {
			e.printStackTrace();
		} catch (TimeoutException e1) {
			e1.printStackTrace();
		}
	}

	@Override
	public void declareOutputFields(OutputFieldsDeclarer declarer) {
		declarer.declare( new Fields( "message" ) ); //declaramos los campos que enviaremos a la topologia
	}

}

package com.josedeveloper.topologia;

import java.io.IOException;

import java.util.Map;

import java.util.concurrent.ArrayBlockingQueue;

import java.util.concurrent.BlockingQueue;

import java.util.concurrent.TimeoutException;

import org.apache.storm.spout.SpoutOutputCollector;

import org.apache.storm.task.TopologyContext;

import org.apache.storm.topology.OutputFieldsDeclarer;

import org.apache.storm.topology.base.BaseRichSpout;

import org.apache.storm.tuple.Fields;

import org.apache.storm.tuple.Values;

import com.rabbitmq.client.AMQP;

import com.rabbitmq.client.Channel;

import com.rabbitmq.client.Connection;

import com.rabbitmq.client.ConnectionFactory;

import com.rabbitmq.client.Consumer;

import com.rabbitmq.client.DefaultConsumer;

import com.rabbitmq.client.Envelope;

public class RabbitMQSpout extends BaseRichSpout {

private static final long serialVersionUID = -5875062340173997062L;

private SpoutOutputCollector collector;

BlockingQueue messages;

private final static String QUEUE_NAME = "data";

@Override

public void nextTuple() {

String message;

while ((message = messages.poll()) != null) {

collector.emit(new Values(message)); //emitimos el mensaje dentro de la topologia

}

@Override

public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {

this.collector = collector;

messages = new ArrayBlockingQueue(100);

ConnectionFactory factory = new ConnectionFactory();

factory.setHost("localhost");

Connection connection;

try {

connection = factory.newConnection();

Channel channel = connection.createChannel();

channel.queueDeclare(QUEUE_NAME, true, false, false, null);

Consumer consumer = new DefaultConsumer(channel) {

@Override

public void handleDelivery(String consumerTag, Envelope envelope, AMQP.BasicProperties properties, byte[] body)

throws IOException {

String message = new String(body, "UTF-8");

try {

messages.put(message);

} catch (InterruptedException e) {

e.printStackTrace();

}

};

channel.basicConsume(QUEUE_NAME, true, consumer);

} catch (IOException e) {

e.printStackTrace();

} catch (TimeoutException e1) {

e1.printStackTrace();

}

@Override

public void declareOutputFields(OutputFieldsDeclarer declarer) {

declarer.declare( new Fields( "message" ) ); //declaramos los campos que enviaremos a la topologia

}

El bolt que leerá los datos que han sido insertados en la topología por el Spout

package com.josedeveloper.topologia;

import java.util.Map;

import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseRichBolt;
import org.apache.storm.tuple.Tuple;

public class MessageBolt extends BaseRichBolt {

	private static final long serialVersionUID = 1L;
	
	@SuppressWarnings("unused")
	private OutputCollector collector;
	
	@Override
	public void execute(Tuple tuple) {
		String message = tuple.getString(0);
		System.out.println("--&gt; " + message);
	}

	@Override
	public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
		this.collector = collector;
	}

	@Override
	public void declareOutputFields(OutputFieldsDeclarer arg0) {
		// TODO Auto-generated method stub

	}

}

package com.josedeveloper.topologia;

import java.util.Map;

import org.apache.storm.task.OutputCollector;

import org.apache.storm.task.TopologyContext;

import org.apache.storm.topology.OutputFieldsDeclarer;

import org.apache.storm.topology.base.BaseRichBolt;

import org.apache.storm.tuple.Tuple;

public class MessageBolt extends BaseRichBolt {

private static final long serialVersionUID = 1L;

@SuppressWarnings("unused")

private OutputCollector collector;

@Override

public void execute(Tuple tuple) {

String message = tuple.getString(0);

System.out.println("--> " + message);

}

@Override

public void prepare(Map conf, TopologyContext context, OutputCollector collector) {

this.collector = collector;

}

@Override

public void declareOutputFields(OutputFieldsDeclarer arg0) {

// TODO Auto-generated method stub

}

Definición de la topología

package com.josedeveloper.topologia;

import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.topology.TopologyBuilder;
import org.apache.storm.utils.Utils;

public class RabbitMQTopologyExample {

	public static void main(String[] args) {
		TopologyBuilder builder = new TopologyBuilder();

        builder.setSpout("spout", new RabbitMQSpout());
        builder.setBolt("bolt", new MessageBolt())
                .shuffleGrouping("spout");

        Config conf = new Config();
        
        LocalCluster cluster = new LocalCluster();
        cluster.submitTopology("test", conf, builder.createTopology());
        Utils.sleep(200000);
        cluster.killTopology("test");
        cluster.shutdown();
	}

}

package com.josedeveloper.topologia;

import org.apache.storm.Config;

import org.apache.storm.LocalCluster;

import org.apache.storm.topology.TopologyBuilder;

import org.apache.storm.utils.Utils;

public class RabbitMQTopologyExample {

public static void main(String[] args) {

TopologyBuilder builder = new TopologyBuilder();

builder.setSpout("spout", new RabbitMQSpout());

builder.setBolt("bolt", new MessageBolt())

.shuffleGrouping("spout");

Config conf = new Config();

LocalCluster cluster = new LocalCluster();

cluster.submitTopology("test", conf, builder.createTopology());

Utils.sleep(200000);

cluster.killTopology("test");

cluster.shutdown();

}

Para ejecutar nuestro ejemplo y verlo funcionando debemos ejecutar 2 clases (el orden sería indistinto):

SendMessagesToRabbit
RabbitMQTopologyExample

Al ejecutar la clase SendMessagesToRabbit, podremos ver en el cliente Web de RabbitMQ como la cola tendrá 10000 mensajes encolados. Al ejecutar la topología (ejecutando la clase RabbitMQTopologyExample) podremos ver como los mensajes se van desencolando y a su vez por linea de comandos (por ejemplo de nuestro editor) veremos los mensajes que en teoría el bolt leyó y procesó.

Espero que les sea de utilidad y disfruten con este framework, desde mi punto de vista es sencillo y funciona bien, incluso la nueva herramienta que utiliza Twitter posee retrocompatibilidad con Storm por lo cual se podría empezar con un ejemplo de este tipo.

Clic aquí para ir al repositorio github.

Web scraping con Java

Posted on julio 25, 2015 por admin

Actualmente me encuentro desarrollando mi proyecto final de máster, el cual consiste en crear un modelo de aprendizaje automático que arroje predicciones acerca de partidos de futbol de la liga de primera división española. Para ello he necesitado entre otras cosas tener los resultados de todas las jornadas de las ultimas ligas. Aunque recientemente conseguí un paquete de R (enlace) que contenía los resultados desde 1929, este no me proporcionaba toda la información que yo buscaba, así que me decidí por obtener yo mismo esa información sacándola de las paginas deportivas y es lo que quiero compartir con ustedes.

En un principio pense en hacerlo en python con la biblioteca lxml, pero haciendo una búsqueda rápida por Internet encontré un proyecto en Java llamado Jsoup y debo decir que este si me simplifico la tarea.

Primero como todos saben es necesario que demos un repaso a la estructura del documento que vamos a scrapear y confirmar que hay un patron.

Como pueden ver en la imagen, podemos detallar que todas las filas de las tablas de las jornadas comparten el atributo itemtype=»http://schema.org/SportEvent», así que este fue el que utilicé para obtener todas las filas y a partir de allí obtener los nombres de los equipos, el resultado y el enlace para ir al detalle del partido.

package com.josedeveloper.WebScrapingExample;

import java.io.IOException;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class App 
{
    public static void main( String[] args ) throws IOException
    {
    	final String url = "http://resultados.as.com/resultados/futbol/primera/2014_2015/calendario";
    	
    	Document doc = Jsoup.connect(url).get();
    	
    	//Obtenemos todas las filas identificadas como evento deportivo
    	//ya que con este atributo es como se identifican los partidos
    	Elements matches = doc.select("tr[itemtype$=\"http://schema.org/SportsEvent\"]");
    	
    	for (Element match: matches) {
    		
    		//Obtenemos los equipos de cada partido utilizando también expresiones
    		Elements teams = match.select("td[itemtype$=\"http://schema.org/SportsTeam\"]");
    		
    		//obtenemos el enlace al detalle del partido
    		Elements score = match.select("a[class=\"resultado resul_post\"]");
    		
    		String localTeam = teams.get(0).text();
    		String visitorTeam = teams.get(1).text();
    		String statsLink = score.first().attr("href");
    		
    		String[] goals = score.first().text().split("-");
    		int localGoals = Integer.parseInt(goals[0].trim());
    		int visitorGoals = Integer.parseInt(goals[1].trim());
    		
    		System.out.println(localTeam + " vs " + visitorTeam + ": " + localGoals + "-" + visitorGoals + " -&gt; " + statsLink);
    	}
    }
}

package com.josedeveloper.WebScrapingExample;

import java.io.IOException;

import org.jsoup.Jsoup;

import org.jsoup.nodes.Document;

import org.jsoup.nodes.Element;

import org.jsoup.select.Elements;

public class App

{

public static void main( String[] args ) throws IOException

{

final String url = "http://resultados.as.com/resultados/futbol/primera/2014_2015/calendario";

Document doc = Jsoup.connect(url).get();

//Obtenemos todas las filas identificadas como evento deportivo

//ya que con este atributo es como se identifican los partidos

Elements matches = doc.select("tr[itemtype$=\"http://schema.org/SportsEvent\"]");

for (Element match: matches) {

//Obtenemos los equipos de cada partido utilizando también expresiones

Elements teams = match.select("td[itemtype$=\"http://schema.org/SportsTeam\"]");

//obtenemos el enlace al detalle del partido

Elements score = match.select("a[class=\"resultado resul_post\"]");

String localTeam = teams.get(0).text();

String visitorTeam = teams.get(1).text();

String statsLink = score.first().attr("href");

String[] goals = score.first().text().split("-");

int localGoals = Integer.parseInt(goals[0].trim());

int visitorGoals = Integer.parseInt(goals[1].trim());

System.out.println(localTeam + " vs " + visitorTeam + ": " + localGoals + "-" + visitorGoals + " -> " + statsLink);

}

Par de cosas que quisiera comentar con respecto al código:

Podemos aplicar expresiones sobre elementos de antemano obtenidos, por ejemplo como se hizo para obtener los equipos que intervienen en el partido.
Existe otra forma además de la anteriormente explicada (usando expresiones) para obtener elementos del árbol DOM de la página Web, si damos un vistazo a la API de la biblioteca del partido, existe un método getElementsByAttributeValue, entonces para obtener el elemento score, este se pudo haber obtenido también de la siguiente manera
Elements score = match.getElementsByAttributeValue(«class», «resultado resul_post»)
Por último si quisiéramos obtener mas datos por ejemplo del detalle del partido (ya que logramos obtener el url), esta biblioteca nos permite seguir navegando (haciendo conexiones), y sería cuestión de realizar otra conexión y de nuevo empezar a extraer elementos.
Document detalleDelPartido = Jsoup.connect(statsLink).get()

Aquí les dejo el enlace al repositorio GitHub y espero que les pueda ser de utilidad.

El Blog de Jose

Blog de tecnología, software y programación

Archivo de la categoría: programación

Enviar correo de GMAIL con Java

Primeros pasos con Apache Spark 2

Primer ejemplo con apache Storm

Web scraping con Java

Uso de cookies