Java 9 Features

Just like Java 8 is known as the major release of lambdas, streams and API changes, similarly Java 9 is all about project Jigsaw, utilities and changes under the hood. In this post, I would like to talk about some of the most exciting features being targeted for Java 9. The full list of new features is available here.

Modular System – Jigsaw Project

This is the biggest addition to the JDK and would bring modularity to the Java platform. An increase in codebase, quite often leads to complicated, tangled “spaghetti code”. It makes it hard to encapsulate code and there is no clear dependencies between different parts (JAR files) of a system. Any public class can access any other public class based on the classpath. This can lead to inadvertent usage of classes that were not supposed to be public API. In fact, the classpath itself is not an elegant way of verifying if the required JARs are present or it there are any duplicate entries. These challenges are handled very well by the module system.

The capabilities of a modular system are quite similar to those of OSGi framework. Modules have an inherent concept of dependencies, can expose a public API and keep implementation details hidden/private. The biggest motivation is to provide modular JVM, which requires lesser memory footprint in order to run on devices. The JVM can then run with only those modules and APIs which are essential.

There is an additional module descriptor in modular JAR files. This module descriptor, has dependencies expressed through “requires” statements. In addition to this, “exports” statements control which packages are accessible to other modules.

Modular JAR files contain an additional module descriptor. In this module descriptor, dependencies on other modules are expressed through “requires” statements. Additionally, “exports” statements control which packages are accessible to other modules. All packages which are not exported are encapsulated in the module by default. Let’s see an example of a module descriptor, which lives in `module-info.java`:

module com.thistechnologylife.java9.modules.house {

requires com.thistechnologylife.java9.modules.furniture;

exports com.thistechnologylife.java9.modules.lawn;

}

The module house requires module furniture and exports a package for lawn.

JShell – The Interactive Java REPL

Jaba 9 comes with a new tool called “jshell”, which stands for Java Shell and also known as REPL (Read Evaluate Print Loop). It can be used to execute and test any Java Constructs like class, interface, enum, object, statements etc. very easily without wrapping them in a separate method or project.

JShell can be launched directly from the console and you can start typing and executing Java code. One great example of jshell is to test regular expressions.

The jshell executable itself can be found in <JAVA_HOME>/bin folder:

jdk-9\bin>jshell.exe

| Welcome to JShell -- Version 9

| For an introduction type: /help intro

jshell> "Say Hello To Java 9 Features".substring(13,19);

$5 ==> "Java 9"

Improved Network Communication with HTTP/2.0 Support

Java 9 comes with a new way of performing HTTP based communication. It provides a long-awaited replacement of the old HttpURLConnection and supports both WebSockets and HTTP/2. The new API is located under the java.net.http package.

HttpClient client = HttpClient.newHttpClient();

HttpRequest request =

HttpRequest.newBuilder(URI.create("http://www.amazon.com"))

.header("User-Agent","Java")

.GET()

.build();

HttpResponse<String> response = client.send(request, HttpResponse.BodyHandler.asString());

HttpClient also provides new APIs to deal with HTTP/2 features such as streams and server push.

Enhanced Process API

Process API had a limited capability to control and manage operating system processes. Even for getting process PID, you would need to either use native code or a tricky workaround. Not to forget, you would need different implementation for each platform.

In Java 9, the process API has been enhanced for controlling and managing operating-system processes. Most of the new functionality is present in the class java.lang.ProcessHandle. Process specific information can be obtained in the following manner:

ProcessHandle procHandle = ProcessHandle.current();

long PID = procHandle.getPid();

ProcessHandle.Info processInfo = procHandle.info();

Optional<String[]> args = processInfo.arguments();

Optional<String> cmd = processInfo.commandLine();

Optional<Instant> startTime = processInfo.startInstant();

Optional<Duration> cpuUsage = processInfo.totalCpuDuration();

As the example illustrates, the current method returns an object representing a process of currently running JVM. The Info subclass provides details about the process.

Now let’s see how we can stop all running processes using destroy method:

childProcess = ProcHandle.current().children();

childProcess.forEach(procHandle -> {

assertTrue("Could not kill process " + procHandle.getPid(), procHandle.destroy());

});

Stream API improvements

Java 9 brings significant improvements to the Stream API which packs a punch in creation of declarative pipelines of transformations on collections. Four new methods have been added to the java.util.Stream interface – dropWhile, takeWhile, ofNullable. The iterate method gets a new overload, which helps in providing a Predicate on when to stop iterating.

The takeWhile method takes a predicate as an argument and returns a Stream of subset of the given Stream values until that Predicate returns false for first time. If first value does NOT satisfy that Predicate, it just returns an empty Stream.

In this post I have tried to give a glimpse of some of the newly introduced features in Java 9. Trust me, we have just scratched the surface and there are many more useful and diverse features which would be available with Java 9. I am quite excited about this. What about you?

Evolution of Java Security Architecture

Java platform had a strong emphasis on security from the very beginning.  The multi-platform support has made people sceptical about the way Java handles security aspects and if there are some hidden risks. Java is a type-safe language and provides automatic garbage collection. The use of secure class loading and verification mechanism allows only legitimate Java code to be executed. The initial version of Java platform was focused specifically on providing a safe environment for running potentially untrusted code, such as Java applets. With the growth in the platform and wide spread deployment, the Java security architecture has evolved to support the expanding set of services. Over the years, the architecture has grown to include a large set of APIs, tools and implementations of security algorithms, mechanisms and protocols. Java security revolves around two key aspects:

  • Secure platform to run applications on.
  • Tools and services available in the Java programming language that facilitate a range of security sensitive applications

Language Level Security

There are multiple mechanisms inbuilt within the Java language to ensure secure programming:

  • Strictly Object-Oriented – All security related advantages of object oriented paradigm can be availed of as there can be no structures outside the boundary of a class.
  • Final Classes and Methods – Undesired modification of functionality can be prevented by making classes and methods as final.
  • Automated Memory Management – There are fewer chances of errors related to memory access as Java language does all the memory management automatically.
  • Automatic Initialization – All memory constructs on heap are automatically initialized. However, all stack based memory is not pre-initialized. This makes sure that all classes and instances are never set undefined values.

Basic Security Architecture

There are a set of APIs spanning all the major security areas like cryptography, public key infrastructure, authentication, secure communication and access control. These APIs help in easy integration of security in application code. They are designed around the following principles:

  • Implementation Independence – Application need not worry about the finer details of security and can request security from the Java platform. These security services are implemented in providers which get plugged into the Java platform via a standard interface. An application may leverage multiple independent providers for security.
  • Implementation Interoperability – By their very nature, providers are interoperable across applications. Just like an application is not bound to a specific provider, similarly a provider is not bound to any specific application.
  • Algorithm Extensibility – There are a number of built-in providers that implement a basic set of security services. There is a provision for installing custom providers which address specific needs of applications.

Security Providers

The concept of security provider is encapsulated by java.security.Provider class. It specifies the provider’s name and lists the security services it supports.  It is possible to configure multiple providers at the same time and can be listed in order of preference. When a security service is requested, the provider with the highest priority is selected.  Message digest creation is one type of service available from providers. An application can invoke getInstance method in the java.securityy.MessageDigest class to obtain an implementation of a specific message digest algorithm like MD5.

MessageDisgest md = MessageDigest.getInstance(“MD”);

Optionally, the provider name can be specified while requesting an implementation.

MessageDigest md = MessageDigest.getInstance(“MD5”, “ProviderA”);

Following figures illustrate these options for requesting an MD5 message digest implementation. Both the instances depict three providers that implement message digest algorithm. The providers are ordered by preference from left to right. In the first instance, an application can be seen requesting MD5 algorithm without specifying a provider name. The providers are looked for in preference order and the implementation from the first provider supplying that particular algorithm, ProviderB is returned. In second instance, the application requests the MD5 algorithm implementation from a specific provider, ProviderC. In this case, the implementation from that provider is returned, even though a provider with a higher preference order, ProviderB also supplies an MD5 implementation.

Figure – 1 – Provider Selection

Cryptography

Java provides a cryptography framework for accessing and developing cryptographic functionality for the Java platform. It includes APIs for a large variety of cryptographic services like Message digest algorithms, Digital signature algorithms, Symmetric stream encryption, Asymmetric encryption, etc. The cryptographic interfaces are provider-based which allows for multiple and interoperable cryptography implementations.

Public Key Infrastructure

The Public Key Infrastructure (PKI) framework enables secure exchange of information based on public key cryptography. It facilitates identities (of people, organizations, etc.) to be bound to digital certificates and provides a means of verifying the authenticity of certificates. PKI comprises of keys, certificates, public key encryption, and trusted Certification Authorities (CAs) who generate and digitally sign certificates.

Authentication

Authentication helps in identifying the user of an executing Java program. The Java platform provides APIs to perform user authentication via pluggable login modules.

Secure Communication

It is quite important to ensure that the data that travels across a network is not accessed by an unintended recipient. In case the data includes private information, like passwords and credit card numbers, steps must be taken to make the data unintelligible to unauthorized parties. The Java platform provides API support and provider implementations for a number of standard secure communication protocols like SSL/TLS, SASL, GSS-API and Kerberos.

Access Control

The access control architecture protects access to sensitive resources (for example, local files) or sensitive application code (for example, methods in a class). All access control decisions are mediated by a security manager, represented by the java.lang.SecurityManager class.  A  SecurityManager must be installed into the Java runtime in order to activate the access control checks.

Enhancements in JDK 8

JDK 8 has wide range of security related enhancements and features. Some of those are listed below:

TLS 1.1 and TLS 1.2 enabled by default

TLS 1.1 and TLS 1.2 are enabled by default on the client side by the SUNJSSE provider. You can enable SunJSSE protocols by using the new system property jdk.tls.client.protocols.

Limited doPrivileged

A new version of the method AccessController.doPrivileged can be used to assert a subset of its privileges, without preventing the full traversal of the stack to check for other permissions.

Stronger Algorithms for Password-Based Encryption

Many AES-based Password-Based Encryption (PBE) algorithms, such as PBEWithSHA256AndAES_128 and PBEWithSHA512AndAES_256, have been added to the SunJCE provider.

SSL/TLS Server Name Indication (SNI) Extension Support in JSSE Server

The SNI extension extends the SSL/TLS protocols to indicate what server name the client is attempting to connect to during handshaking. Servers can use server name indication information to decide if specific SSLSocket or SSLEngine instances should accept a connection. JDK 7 already has SNI extension enabled by default. JDK 8 supports the SNI extension for server applications.

KeyStore Enhancements

A new command option – importpassword facilitates accepting a password and store it securely as a secret key. A new class, java.security.DomainLoadStoreParameter is added to support DKS keystore type.

SHA-224 Message Digests

JDK 8 has enhanced cryptographic algorithms with the SHA-224 variant of the SHA-2 family of message-digest implementations.

Improved Support for High Entropy Random Number Generation

The SecureRandom class helps in generating cryptographically strong random numbers used for private or public keys, ciphers, signed messages, and so on.

64-bit PKCS11 for Windows

The PKCS 11 provider support for Windows has been expanded to include 64-bit.

Weak Encryption Disabled by Default

The DES-related Kerberos 5 encryption types are not supported by default. These encryption types can be enabled by adding allow_wbeak_crypto=true in the krb5.conf file, but DES-related encryption types are considered highly insecure and should be avoided.

Unbound SASL for the GSS-API/Kerberos 5 mechanism

The  Krb5LoginModule principal value in a JAAS configuration file can be set to asterisk (*) on the acceptor side to denote an unbound acceptor. This means that the initiator can access the server using any service principal name if the acceptor has the long term secret keys to that service. The name can be retrieved by the acceptor using the GSSContext.getTargName() method after the context is established.

SASL service for multiple host names

While creating a SASL server, the server name can be set to null to denote an unbound server, which means a client can request for the service using any server name.

Java 8 has taken a definitive step in making Java more secure by introducing these new security enhancements.

Build Data Insights and Business Metrics with ELK Stack

With the world around us getting more and more connected, there is advent of different types of computing devices. It could be a heavy-duty server, laptop, desktop, mobile phone or even your refrigerator. One unique thread that connects all these devices is their logging of system information. These logs are nothing but a stream of messages in time-sequence. Systems can now log any piece of structured or unstructured data, application logs, transactions, audit logs, alarms, statistics or even tweets. Add to this the scale of logs. The earlier methodology of human analysis would not work in this kind of scenario. There has to be some automated mechanism for log analysis and deciphering useful information from them.

The trio of Logstash, Kibana and Elasticsearch is one of the most popular open source solutions for logs management. The three products together are known as the ELK stack and provide an elegant solution for log management. At the heart of ELK stack is Elasticsearch which is a distributed, open source search and analytic engine. It is based on Apache Lucene and is designed for horizontal scalability, reliability, and easy management. Logstash is a data collection, enrichment, and transportation pipeline. The ELK stack is completed by Kibana, which is a data visualization platform enabling interaction with data through stunning, powerful graphics.

In order to start your discovery of ELK stack, check out my book titled – Applied ELK Stack: Data Insights and Business Metrics with Collective Capability of ElasticSearch, Logstash and Kibana. With this book you will discover:

  • The need for log analytics, and current challenges
  • How to perform real-time data analytics on streaming data, and turn them into actionable insights
  • How to create indexing and delete data
  • The different components of ELK (Elasticsearch, Logstash, and Kibana) stack
  • Shipping, Filtering, and Parsing Events with Logstash
  • How to build amazing visualizations and dashboards using Data Discovery, Visualization, and Dashboard with Kibana

I hope this book is able to help you with log management along with providing business insights. Do let me know your valuable feedback on the book.

Microservices Asynchronous Communication

Communication among microservices can be either synchronous or asynchronous. Each mode has its own pros and cons. In this post, I would like to share the mechanism of asynchronous communication. As an example, lets consider MyKart, which is an online shopping portal. Just like a typical E-Commerce site, customers (users) can login, browse through product categories, select products and order after paying online. The service architecture is based on micro-service architecture and the application is being built on top of Java stack with the node.js/java script based UI.

Lets see how RabbitMQ can be used for asynchronous communication amongst the microservices.

Service Interaction

The interaction amongst the different services is depicted below:

SequenceDiagram

A typical use case for ordering products is:

  1. Customer (user) logs into the MyKart portal. This is facilitated by the Order service. For existing customers, the details are fetched from Customer service. For new users, a new customer account is created.
  2. Customer can scan through all the different categories of products. The Order service queries the Catalog service to get the details of all the products.
  3. Customer selects the products which he/she wants to purchase. The Billing service is used for online transaction. A payment gateway is used to pay for the purchased products.
  4. Order service maintains records of purchased products and notifies the Shipment service of the transaction. The shipment service will further inform logistics team to prepare for shipping of the purchased products.

Service Interaction and Communication

In order to help with scalability, bulk of the interaction amongst the services would be through asynchronous communication. This would be facilitated by RabbitMQ messaging system. It would help to mention here key concepts of RabbitMQ.

RabbitMQ Fundamentals

The three parts that route messages in RabbitMQ’s are exchanges, queues, and bindings. Applications direct messages to exchanges. Exchanges route those messages to queues based on bindings. Bindings tie queues to exchanges and define a key (or topic) which it binds with.

Exchange

Besides the exchange type, exchanges are declared with a number of attributes, the most important of which are:

  • Name
  • Durability (exchanges survive broker restart)
  • Auto-delete (exchange is deleted when all queues have finished using it)
  • Arguments (these are broker-dependent)

Exchanges can be durable or transient. Durable exchanges survive broker restart whereas transient exchanges do not. Different types of exchanges are as following:

  • Default Exchange – The default exchange is a direct exchange with no name (empty string) pre-declared by the broker. It has one special property that makes it very useful for simple applications: every queue that is created is automatically bound to it with a routing key which is the same as the queue name.
  • Direct Exchange – A direct exchange delivers messages to queues based on the message routing key. A direct exchange is ideal for the unicast routing of messages.
  • Fanout Exchange – A fanout exchange routes messages to all of the queues that are bound to it and the routing key is ignored. If N queues are bound to a fanout exchange, when a new message is published to that exchange a copy of the message is delivered to all N queues. Fanout exchanges are ideal for the broadcast routing of messages.
  • Topic Exchange – Topic exchanges route messages to one or many queues based on matching between a message routing key and the pattern that was used to bind a queue to an exchange. The topic exchange type is often used to implement various publish/subscribe pattern variations. Topic exchanges are commonly used for the multicast routing of messages.
  • Headers Exchange – A headers exchange is designed for routing on multiple attributes that are more easily expressed as message headers than a routing key. Headers exchanges ignore the routing key attribute. Instead, the attributes used for routing are taken from the headers attribute. A message is considered matching if the value of the header equals the value specified upon binding.

Queues

Queues store messages that are consumed by applications. The key properties of queues are as following:

  • Queue Name – Applications may pick queue names or ask the broker to generate a name for them.
  • Queue Durability – Durable queues are persisted to disk and thus survive broker restarts. Queues that are not durable are called transient. Not all scenarios and use cases mandate queues to be durable.
  • Bindings – These are rules that exchanges use (among other things) to route messages to queues. To instruct an exchange E to route messages to a queue Q, Q has to be bound to E. Bindings may have an optional routing key attribute used by some exchange types.

Message Attributes

Messages have a payload and attributes. Some of the key attributes are listed below:

  • Content type
  • Content encoding
  • Routing key
  • Delivery mode (persistent or not)
  • Message priority
  • Message publishing timestamp
  • Expiration period
  • Publisher application id

Message Acknowledgements – We know that networks can be unreliable and this might lead to failure of applications. So it is often necessary to have some kind of processing acknowledgement. In some cases it is only necessary to acknowledge the fact that a message has been received. In other cases acknowledgements mean that a message was validated and processed by a consumer.

Service Communication

All the services of MyKart would utilize RabbitMQ infrastructure for communicating with each other. This is depicted in the following diagram:

RabbitMQ-Comm

When different services want to implement request-response communication using RabbitMQ, there are two key patterns:

  • Exclusive reply queue per request – In this case each request creates a reply queue. The benefits are that it is simple to implement. There is no problem with correlating the response with the request, since each request has its own response consumer. If the connection between the client and the broker fails before a response is received, the broker will dispose of any remaining reply queues and the response message will be lost. Key consideration to take care is that we need to clean up any replies queues in the event that a problem with the server means that it never publishes the response. This pattern has a performance cost because a new queue and consumer has to be created for each request.
  • Exclusive reply queue per client – In this case each client connection maintains a reply queue which many requests can share. This avoids the performance cost of creating a queue and consumer per request, but adds the overhead that the client needs to keep track of the reply queue and match up responses with their respective requests. The standard way of doing this is with a correlation id that is copied by the server from the request to the response.

Durable reply queue – In both the above patterns, the response message can be lost if the connection between the client and broker goes down while the response is in flight. This is because they use exclusive queues that are deleted by the broker when the connection that owns them is closed. This can be avoided by using a non-exclusive reply queue. However this creates some management overhead. One needs some way to name the reply queue and associate it with a particular client. The problem is that it’s difficult for the client to know if any one reply queue belongs to itself, or to another instance. It’s easy to naively create a situation where responses are being delivered to the wrong instance of the client. In the worst case, we would have to manually create and name response queues, which remove one of the main benefits of choosing broker based messaging in the first place.

Rather than using the exclusive reply queue per request, I would recommend to use exclusive reply queue per client. Each service would have its own exchange of type direct where other services would send request events. For response events, the default exchange would be leveraged. All services would have a response queue bound to default exchange corresponding to each service from which response is expected.

Request – Response Event
The request and response messages would be send as an event structure. They would contain the following information:

  1. Correlation Id
    In order to co-relate response event with request event, unique request correlation id would be used. This would help in uniquely identifying a request and tying it up with response. One way of generating unique id is to leverage UUID class of Java. See the code snippet below:
    UUID.randomUUID().toString();
    There can be other mechanisms also for unique id generation, but in my view this is most suitable and user-friendly.
  2. Payload
    The request information and response details would be serialized as a JSON string. While sending a request, the request object details would be converted into JSON format. Similarly, while receiving response, the JSON information would be used to populate the actual response object. The third party Jackson utility would be leveraged for object to JSON conversion and vice-versa.

Service Interaction
Let us take an example to see how the different services would leverage the messaging infrastructure for interaction. Let us take the use case when a customer logs in to the Order service and it then fetches customer information.

  1. Order service needs to get customer information, like name, address, etc. and for this it has to query the Customer service.
  2. The request object contains the customer id. It is converted into JSON format using Jackson.
  3. Java UUID class is used to generate the correlation id.
  4. The message is send to Customer exchange with customer.read as the routing key. The CustomerReadQueue is bound to Customer exchange with this key and so the message gets delivered to CustomerReadQueue.
  5. The reply_to field is used to indicate on which particular queue the response event should be send. In this case, the reply_to field is filled with the value CustomerResponseQueue.
  6. Once the Customer service is able to fetch the desired information, it converts it into JSON format. It then populates the Correlation Id received earlier. The response event is then send to CustomerResponseQueue, which is bound to the default exchange.
  7. The Order service is waiting for response messages on CustomerResponseQueue. Once it receives an event it uses the JSON payload to populate the Response object. The Correlation Id is used to map it to the request object send earlier.

Service Notification
Besides, the regular request-response interaction, the messaging infrastructure can also be used for one way notifications. Any good application should always log important information and also raise metrics/alarms for the same. All the services of MyKart would send important notifications to Logging Service and Metrics service. The Logging service would use these events to log information in logging infrastructure. The Metrics service would use this information to raise metrics and alarms. These metrics are helpful in serviceability of microservices.

For MyKart, there is Fanout exchange by the name of Notification exchange. It would send the received information to all the registered queues. Both the Logging service and Metrics service have their queues bound to this exchange. Events for Logging service are received on LoggingQueue and the events for Metrics service are received on MetricsQueue.

Conclusion

As has been demonstrated in the shopping cart example, RabbitMQ can be used effectively for asynchronous communication among different services.

How to Create a Blog

Today I would like to share my discovery of an interesting blog. It goes by the name of Simple Programmer and its founder is John Sonmez. This blog is little different from you regular tech blog and that adds to the ingenuity. It offers fresh perspective on how programmers should live fuller lives and improve their careers. The content at Simple Programmer is holistic in nature. It offers diverse range of topics like developing your people skills, getting in shape and tackling the mental aspects of being a software developer.

John also offers a free seven part course titled “How to Create a Blog“. I subscribed to this course and received valuable insights on how to start a career boosting blog. The key points of this course are:

  • How to start blog with a good domain name and hosting server.
  • How to select your niche
  • Keep a backlog of topics
  • Be consistent with your blogging activities.

Before I went through this course, I had a blog on WordPress which is essentially a free service. I realized that it is important to have my own domain name. I used Bluehost to register my domain name and also host my blog. Following John’s advice I now have a backlog of ideas and I have promised myself to be consistent.

Go check out this awesome course.

Mining Mailboxes with Elasticsearch and Kibana

In a previous post I had mentioned that the trio of Logstash, Kibana and Elasticsearch (ELK stack) is one of the most popular open source solutions for not only logs management but also data analysis. In this post I will demonstrate how ELK can be used to effectively and efficiently perform big data analysis. As a reference let’s take some huge mailbox data. Mail archives are arguably one of the most interesting kind of social web data. It is omnipresent and each message throws light on the communication which people are having. As a CXO of an organization you may want to analyze corporate mails for trends and patterns.

As a reference, I will take the well-known Enron corpus as it has a huge collection of mails and there is no risk of any legal or privacy concerns. This data will be standardized into Unix mailbox (mbox) format. From the mbox format it will be again transformed into a single json file.

Getting the Enron corpus data

The full Enron dataset in a raw form is available for download in various formats. I will start with the original raw form of the data set that is essentially a set of folders that organizes a collection of mailboxes by person and folder. The following snippet would illustrate the basic structure of the corpus after you have downloaded and unarchived it. Go ahead and play with it a little bit so that you become familiar with it.


C:> cd enron_mail_20110402maildir # Go into the mail directory

C:\enron_mail_20110402\maildir>dir # Show folders/files in the current directory
allen-p         crandell-s     gay-r           horton-s

lokey-t         nemec-g         rogers-b       slinger-r

tycholiz-b     arnold-j       cuilla-m       geaccone-t
<pre>               …directory listing truncated…</pre>
neal-s         rodrique-r     skilling-j     townsend-j
<pre>C:enron_mail_20110402maildir> cd allen-p/ # Go into the allen-p folder

C:enron_mail_20110402maildirallen-p> dir # Show files in the current directory</pre>
_sent_mail         contacts         discussion_threads notes_inbox

sent_items         all_documents     deleted_items     inbox
sent               straw
C:\enron_mail_20110402\maildir\allen-p> cd inbox/ # Go into the inbox for allen-p
C:\enron_mail_20110402\maildirallen-p\inbox> dir # Show the files in the inbox for allen-p

  1. 11. 13. 15. 17. 19. 20. 22. 24. 26. 28. 3. 31. 33. 35. 37. 39. 40.
  2. 44. 5. 62. 64. 66. 68. 7. 71. 73. 75. 79. 83. 85. 87. 10. 12. 14.
  3. 18. 2. 21. 23. 25. 27. 29. 30. 32. 34. 36. 38. 4. 41. 43. 45. 6.

63. 65. 67. 69. 70. 72. 74. 78. 8. 84. 86. 9.
C:\enron_mail_20110402\maildir\allen-p\inbox> cat 1. # Show contents of the file named “1.”

Message-ID: &amp;lt;16159836.1075855377439.JavaMail.evans@thyme&amp;gt;
Date: Fri, 7 Dec 2001 10:06:42 -0800 (PST)
From: heather.dunton@enron.com
To: k..allen@enron.com
Subject: RE: West Position
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-From: Dunton, Heather &amp;lt;/O=ENRON/OU=NA/CN=RECIPIENTS/CN=HDUNTON&amp;gt;
X-To: Allen, Phillip K. &amp;lt;/O=ENRON/OU=NA/CN=RECIPIENTS/CN=Pallen&amp;gt;
X-cc:
X-bcc:
X-Folder: Phillip_Allen_Jan2002_1Allen, Phillip K.Inbox
X-Origin: Allen-P
X-FileName: pallen (Non-Privileged).pst

Please let me know if you still need Curve Shift.

Thanks,

 

Now the next step is to convert the mail data into Unix mbox format. An mbox is in fact just a large text file of concatenated mail messages that are easily accessible by text-based tools. I have used python script to convert it into mbox format. Thereafter, this mbox file would be converted into ELK compatible JSON format. The json file can be found here. A snippet of json file can be found below:


{"index":{"_index":"enron","_type":"inbox"}}


[{"X-cc": "", "From": "r-3-728402-1640008-2-359-us2-982d4478@xmr3.com", "X-Folder": "\jskillin\Inbox", "Content-Transfer-Encoding": "7bit", "X-bcc": "", "X-Origin": "SKILLING-J", "To": ["jeff.skilling@enron.com"], "parts": [{"content": "n[IMAGE]n[IMAGE]nJoin us June 26th for an on-line seminar featuring Steven J. Kafka, Senior Analyst at Forrester Research, as he discusses how technology can create more effective collaboration in today's virtualized enterprise. Also featuring Mike Hager, VP, OppenheimerFunds, offering insights into implementing these technologies through real-world experiences. Brian Anderson, CMO, Access360 will share techniques and provide tips on how to successfully deploy resources across the virtualized enterprise. nDon't miss this important event. Register now at http://www.access360.com/webinar/ . For a sneak preview, check out our one-minute animation that illustrates the challenges of provisioning access rights across the "virtualized" enterprise.nAbout Access360nAccess360 provides the software and services needed for deploying policy-based provisioning solutions. Our solutions help companies automate the process of provisioning employees, contractors and business partners with access rights to the applications they need. With Access360, companies can react instantaneously to changing business environments and relationships and operate with confidence, whether in a closed enterprise environment or across a virtual or extended enterprise.n nAccess360 nnIf you would prefer not to receive further messages from this sender:n1. Click on the Reply button.n2. Replace the Subject field with the word REMOVE.n3. Click the Send button.nYou will receive one additional e-mail message confirming your removal.nn", "contentType": "text/plain"}], "X-FileName": "jskillin.pst", "Mime-Version": "1.0", "X-From": "Access360 <R-3-728402-1640008-2-359-US2-982D4478@xmr3.com>@ENRON", "Date": {"$date": 991326029000}, "X-To": "Skilling, Jeff </o=ENRON/ou=NA/cn=Recipients/cn=JSKILLIN>", "Message-ID": "<14649554.1075840159275.JavaMail.evans@thyme>", "Content-Type": "text/plain; charset=us-ascii", "Subject": "Forrester Research on Best Practices for the "Virtualized" Enterprise"}

 

When you have huge amount of data to be pushed into Elasticsearch then it is better to do bulk import by specifying the data file. Each mail message is in a line of its own associated with an entry specifying the index (enron) and document (inbox). There is no need to specify the id as Elasticsearch would automatically specify the id.

Data in Elasticsearch can be broadly divided into two types – exact values and full text. Exact values are exactly what they sound like. Examples are a date or a user ID, but can also include exact strings such as a username or an email address. For e.g., the exact value Foo is not the same as the exact value foo. The exact value 2014 is not the same as the exact value 2014-09-15. On the other hand Full text refers to textual data – usually written in some human language – like the text of a tweet or the body of an email. For the purpose of this exercise, it is better to treat Email addresses (To, CC, BCC) as exact values. Hence, we first need to specify the mapping, which can be done in the following manner.

curl -XPUT “localhost:9200/enron” -d "{
"settings":
{
    "number_of_shards": 5,
    "number_of_replicas": 1
},
"mappings":
{
    "inbox":
    {
        "_all":
        {
            "enabled": false
        },
        "properties":
        {
            "To":
            {
                "type": "string",
                "index": "not_analyzed"
            },
            "From":
            {
                "type": "string",
                "index": "not_analyzed"
            },
            "CC":
            {
                "type": "string",
                "index": "not_analyzed"
            },
            "BCC":
            {
                "type": "string",
                "index": "not_analyzed"
            }
        }
    }
}
}"

 

You can verify that the mapping has indeed been set.

curl -XGET "http://localhost:9200/_mapping?pretty"
{
    "enron" :
    {
        "mappings" :
        {
            "inbox" :
            {
                "_all" :
                {
                    "enabled" : false
                },
                "properties" :
                {
                    "BCC" :
                    {
                        "type" : "string",
                        "index" : "not_analyzed"
                    },
                    "CC" :
                    {
                        "type" : "string",
                        "index" : "not_analyzed"
                    },
                    "From" :
                    {
                        "type" : "string",
                        "index" : "not_analyzed"
                    },
                    "To" :
                    {
                        "type" : "string",
                        "index" : "not_analyzed"
                    }
                }
            }
        }
    }
}

 

Now let’s load all the mailbox data by using the json file, in the following manner:


curl -XPOST "http://localhost:9200/_bulk" --data-binary @enron.json

 

We can check if all the data has been uploaded successfully.

curl "localhost:9200/enron/inbox/_count?pretty"
{
    "count" : 41299,
    "_shards" :
    {
        "total" : 5,
        "successful" : 5,
        "failed" : 0
    }
}

 

You can see that 41299 records each corresponding to a different message, have been uploaded. Now lets start the fun part by doing some analysis on this data. Kibana provides awesome analytic capability and associated charts. Lets try to see how many messages are circulated on a weekly basis.

enron-date

The above histogram shows the message spread on a weekly basis. The date value is in terms of milliseconds past the epoch. You can see that one particular week has a peak of 3546 messages. Something interesting must be happening that week. Now lets see who are the top recipients of messages

enron-to

You can see that Gerald, Sara, Kenneth are some of the top recipients of messages. How about checking out the top senders?

enron-from

You can see that Pete, Jae and Ken are the top senders of messages. In case you are wondering what exactly Enron employees used to discuss, let’s check out top keywords from message subjects.

enorn-subject

It seems most interesting discussions centered on enron, gas, energy, power. There can be a lot more interesting analysis done with the Enron mail data. I would recommend you try the following:

  • Counting sent/received messages for particular email addresses
  • What was the maximum number of recipients on a message?
  • Which two people exchanged the most messages amongst one another?
  • How many messages were person-to-person messages?

 

 

Java 8 Lambda Expression for Design Patterns – Strategy Design Pattern

The strategy pattern defines a family of algorithms encapsulated in a driver class usually known as Context and enables the algorithms to be interchangeable. It makes the algorithms easily interchangeable, and provides mechanism to choose the appropriate algorithm at a particular time.

The algorithms (strategies) are chosen at runtime either by a Client or by the Context. The Context class handles all the data during the interaction with the client.

The key participants of the Strategy pattern are represented below:

strategy.png
  • Strategy – Specifies the interface for all algorithms. This interface is used to invoke the algorithms defined by a ConcreteStrategy.
  • Context – Maintains a reference to a Strategy object.
  • ConcreteStrategy – Actual implementation of the algorithm as per Strategy interface

Now let’s look at a concrete example of the strategy pattern and see how it gets transformed with lambda expressions. Suppose we have different type of rates for calculating income tax. Based on whether tax is paid in advance or late, there is a rebate or penalty, respectively. We can encapsulate this functionality in the same class as different methods but it would need modification to the class if some other tax calculation is required in future. This is not an efficient approach. Changes in implementation of a class should be the last resort.

Let’s take an optimal approach by using Strategy pattern. We will make an interface for Tax Strategy with a basic method:

public interface TaxStrategy {

	public double calculateTax(double income);
}

Now let’s define the concrete strategy for normal income tax.

public class PersonalTaxStrategy implements TaxStrategy {

	public PersonalTaxStrategy() { }

	@Override
	public double calculateTax(double income) {

		System.out.println("PersonalTax");

		double tax = income * 0.3;
		return tax;
	}
}

The PersonalTaxStrategy class conforms to the TaxStrategy interface. Similarly, let’s define a concrete strategy for late tax payment which incurs a penalty.

public class PersonalTaxPenaltyStrategy implements TaxStrategy {

	public PersonalTaxPenaltyStrategy() { }

	@Override
	public double calculateTax(double income) {

		System.out.println("PersonalTaxWithPenalty");

		double tax = income * 0.4;
		return tax;
	}
}

Next lets define a concrete strategy for advance tax payment which results in tax rebate.

public class PersonalTaxRebateStrategy implements TaxStrategy {

	public PersonalTaxRebateStrategy() { }

	@Override
	public double calculateTax(double income) {

		System.out.println("PersonalTaxWithRebate");

		double tax = income * 0.2;
		return tax;
	}
}

Now let’s combine all classes and interfaces defined to leverage the power of Strategy pattern. Let the main method act as Context for the different strategies. See just one sample interplay of all these classes:

import java.util.Arrays;
import java.util.List;

public class TaxStrategyMain {

	public static void main(String [] args) {

		//Create a List of Tax strategies for different scenarios
		List<TaxStrategy> taxStrategyList =
				Arrays.asList(
						new PersonalTaxStrategy(),
						new PersonalTaxPenaltyStrategy(),
						new PersonalTaxRebateStrategy());

		//Calculate Tax for different scenarios with corresponding strategies
		for (TaxStrategy taxStrategy : taxStrategyList) {
			System.out.println(taxStrategy.calculateTax(30000.0));
		}
	}
}

Running this gives the following output:

PersonalTax
9000.0
PersonalTaxWithPenalty
12000.0
PersonalTaxWithRebate
6000.0

 

It clearly demonstrates how different tax rates can be calculated by using appropriate concrete strategy class. I have tried to combine all the concrete strategy (algorithms) in a list and then access them by iterating over the list.

What we have seen till now is just the standard strategy pattern and it’s been around for a long time. In these times when functional programming is the new buzzword one may ponder with the support of lambda expressions in Java, can things be done differently? Indeed, since the strategy interface is like a functional interface, we can rehash using lambda expressions in Java. Let’s see how the code looks like:

import java.util.Arrays;
import java.util.List;

public class TaxStrategyMainWithLambda {

	public static void main(String [] args) {

		//Create a List of Tax strategies for different scenarios with inline logic using Lambda
		List<TaxStrategy> taxStrategyList =
				Arrays.asList(
						(income) -> { System.out.println("PersonalTax"); return 0.30 * income; },
						(income) -> { System.out.println("PersonalTaxWithPenalty"); return 0.40 * income; },
						(income) -> { System.out.println("PersonalTaxWithRebate"); return 0.20 * income; }
			);

		//Calculate Tax for different scenarios with corresponding strategies
		taxStrategyList.forEach((strategy) -> System.out.println(strategy.calculateTax(30000.0)));
	}
}

Running this gives the similar output:

PersonalTax
9000.0
PersonalTaxWithPenalty
12000.0
PersonalTaxWithRebate
6000.0

 

We can see that use of lambda expressions, makes the additional classes for concrete strategies redundant. You don’t need additional classes; simply specify additional behavior using lambda expression.

All the code snippets can be accessed from my github repo

Java 8 Lambda Expression for Design Patterns – Decorator Design Pattern

The Decorator pattern (also known as Wrapper) allows behavior to be added to an individual object, either statically or dynamically, without affecting the behavior of other objects from the same class. It can be considered as an alternative to subclassing. We know that subclassing adds behavior at compile time and the change affects all instances of the original class. On the other hand, decorating can provide new behavior at run-time for selective objects.

The decorator conforms to the interface of the component it decorates so that it is transparent to the component’s clients. The decorator forwards requests to the component and may perform additional actions before or after forwarding. Transparency allows decorators to be nested recursively, thereby allowing an unlimited number of added responsibilities. The key participants of the Decorator pattern are represented below:

decorator

  • Component – Specifies the interface for objects that can have responsibilities added to them dynamically.
  • ConcreteComponent – Defines an object to which additional responsibilities can be added
  • Decorator – Keeps a reference to a Component object and conforms to Component’s interface. It contains the Component object to be decorated.
  • ConcreteDecorator – Adds responsibility to the component.

Now let’s look at a concrete example of the decorator pattern and see how it gets transformed with lambda expressions. Suppose we have different types of books and these books may be in different covers or different categories. We can chose any book and specify category or language by using inheritance. Book can be abstracted as a class. After that any other class can extend Book class and override the methods for cover or category. But this is not an efficient approach. Under this approach, the sub-classes may have unnecessary methods extended from super class. Also if we had to add more attributes or characterization there would be a change in parent class. Changes in implementation of a class should be the last resort.

Let’s take an optimal approach by using Decorator pattern. We will make an interface for Book with a basic method:

public interface Book {

    public String describe();

}

A BasicBook class can implement this interface to provide minimal abstraction:

public class BasicBook implements Book {

    @Override
    public String describe() {

        return "Book";

    }

}

Next, lets define the abstract class BookDecorator which will act as the Decorator:

abstract class BookDecorator implements Book {

    protected Book book;

    BookDecorator(Book book) {
        this.book = book;
    }

    @Override
    public String describe() {
        return book.describe();
    }
}

The BookDecorator class conforms to the Book interface and also stores a reference to Book interface. If we want to add category as a property to Book interface, we can use a concrete class which implements BookDecorator interface. For Fiction category we can have the following Decorator:

public class FictionBookDecorator extends BookDecorator {

    FictionBookDecorator(Book book) {
        super(book);
    }

    @Override
    public String describe() {
        return ("Fiction " + super.describe());
    }
}

You can see that FictionBookDecorator adds the category of book in the original operation, i.e. describe. Similarly, if we want to specify Science category we can have the corresponding ScienceBookDecorator:

public class ScienceBookDecorator extends BookDecorator {

    ScienceBookDecorator(Book book) {
        super(book);
    }

    @Override
    public String describe() {
        return ("Science " + super.describe());
    }
}

ScienceBookDecorator also adds the category of book in the original operation. There can also be another set of Decorators which indicate what type of cover the book has. We can have SoftCoverDecorator describing that the book has a soft cover.

public class SoftCoverDecorator extends BookDecorator {

	SoftCoverDecorator(Book book) {
		super(book);
	}
	
	@Override
	public String describe() {	
		return (super.describe() + " with Soft Cover");
	}
}

We can also have a HardCoverDecorator describing the book has a hard cover.

public class HardCoverDecorator extends BookDecorator {
	
	HardCoverDecorator(Book book) {
		super(book);
	}
	
	@Override
	public String describe() {	
		return (super.describe() + " with Hard Cover");
	}
}

Now let’s combine all classes and interfaces defined to leverage the power of Decorator pattern. See just one sample interplay of all these classes:

import java.util.List;
import java.util.ArrayList;

public class BookDescriptionMain {
	
	public static void main(String [] args) {
		
		BasicBook book = new BasicBook();
		
		//Specify book as Fiction category
		FictionBookDecorator fictionBook = new FictionBookDecorator(book);
		
		//Specify that the book has a hard cover
		HardCoverDecorator hardCoverBook = new HardCoverDecorator(book);
		
		//What if we want to specify both the category and cover type together
		HardCoverDecorator hardCoverFictionBook = new HardCoverDecorator(fictionBook);				
		
		//Specify book as Science category
		ScienceBookDecorator scienceBook = new ScienceBookDecorator(book);
		
		//What if we want to specify both the category and cover type together
		HardCoverDecorator hardCoverScienceBook = new HardCoverDecorator(scienceBook);				

		//Add all the decorated book items in a list
		List<Book> bookList = new ArrayList<Book>() {
			{
				add(book);
				add(fictionBook);
				add(hardCoverBook);
				add(hardCoverFictionBook);
				add(scienceBook);
				add(hardCoverScienceBook);
			}
		};
		
		//Traverse the list to access all the book items
		for(Book b: bookList) {
			System.out.println(b.describe());
		}		
	}
}

Running this gives the following output:

Book
Fiction Book
Book with Hard Cover
Fiction Book with Hard Cover
Science Book
Science Book with Hard Cover

It clearly demonstrates how different properties can be added to any pre-defined class / object. Also, multiple properties can be combined. I have tried to combine all the decorated book items in a list and then access them by iterating over the list.

What we have seen till now is just the standard decorator pattern and it’s been around for a long time now. In these times when functional programming is the new buzzword one may ponder whether the support of lambda expressions in Java, can things be done differently. Indeed, since the decorated interface is like a function interface, we can rehash using lambda expressions in Java. Lets see how the code looks like:

import java.util.List;
import java.util.ArrayList;

public class BookDescriptionMainWithLambda {
	
	public static void main(String [] args) {
		
		BasicBook book = new BasicBook();
		
		//Specify book as Fiction category using Lambda expression
		Book fictionBook = () -> "Fiction " + book.describe();
		
		//Specify that the book has a hard cover using Lambda expression
		Book hardCoverBook = () -> book.describe() + " with Hard Cover";
		
		//What if we want to specify both the category and cover type together
		Book hardCoverFictionBook = () -> fictionBook.describe() + " with Hard Cover";				
		
		//Specify book as Science category using Lambda expression
		Book scienceBook = () -> "Science " + book.describe();
		
		//What if we want to specify both the category and cover type together
		Book hardCoverScienceBook = () -> fictionBook.describe() + " with Hard Cover";				

		List<Book> bookList = new ArrayList<Book>() {
			{
				add(book);
				add(fictionBook);
				add(hardCoverBook);
				add(hardCoverFictionBook);
				add(scienceBook);
				add(hardCoverScienceBook);
			}
		};
		
		bookList.forEach(b -> System.out.println(b.describe()));
	}
}

Running this gives the similar output:

Book
Fiction Book
Book with Hard Cover
Fiction Book with Hard Cover
Science Book
Fiction Book with Hard Cover

We can see that use of lambda expressions, makes the additional classes for decorators redundant. You don’t need additional classes; simply specify additional behavior using lambda expression. However, there is support for finding the decorator again for re-use. If you have a concrete decorator class, you can reuse it next time also.

All the code snippets can be accessed from my github repo

Book Review – OCP Java SE 7 Programmer II Certification Guide: Prepare for the 1ZO-804 exam

Although this book is essentially for preparing for OCP certification but I feel it is a great reference for anyone interested in learning Java 7 features. Oracle has been adding tons of features with each Java release and it is no mean task to keep up with them. Author has done a great job in providing adequate treatment to all topics. I thoroughly enjoyed reading this book.

High Points:

  • Each chapter has well defined mapping with exam objectives, exam tips, sidebars and Twist in Tale
  • Interesting illustrations which keep the reader focused like Thread Life cycle.
  • Funny way to introduce concepts like overloading, exceptions. I loved the airplane example for exceptions.
  • Inner details explained with decompiled code (For e.g. Enum)
  • Good introduction to design patterns like Singleton, Factory, DAO, etc.
  • Lucid explanation for Collections with interesting examples.
  • Novel way to explain difference between Assertions and Exceptions.
  • Review notes and sample exam questions at the end of each chapter.

Improvement Areas:

  • Compiler errors can be in a different color.
  • I think Page 249 has a typo error. The phrase “Won’t compile; no way to pass argument to T” should be “Won’t compile; no way to pass argument T”
  • More emphasis should be given to StringBuffer and StringBuilder as developers typically get confused in them.
  • Exception Handling – More emphasis on why we should not catch Errors. Moreover, I don’t think extending RuntimeException should be encouraged.
  • Few examples for Java I/O.
  • Concurrency is not an easy area so more detailed treatment with sufficient examples should be given.
  • A consolidated mock paper in the end would help.

I feel this book would be good reference for preparing for OCP certification.

This review originally appeared at Amazon listing of the book.