Build Data Insights and Business Metrics with ELK Stack

With the world around us getting more and more connected, there is advent of different types of computing devices. It could be a heavy-duty server, laptop, desktop, mobile phone or even your refrigerator. One unique thread that connects all these devices is their logging of system information. These logs are nothing but a stream of messages in time-sequence. Systems can now log any piece of structured or unstructured data, application logs, transactions, audit logs, alarms, statistics or even tweets. Add to this the scale of logs. The earlier methodology of human analysis would not work in this kind of scenario. There has to be some automated mechanism for log analysis and deciphering useful information from them.

The trio of Logstash, Kibana and Elasticsearch is one of the most popular open source solutions for logs management. The three products together are known as the ELK stack and provide an elegant solution for log management. At the heart of ELK stack is Elasticsearch which is a distributed, open source search and analytic engine. It is based on Apache Lucene and is designed for horizontal scalability, reliability, and easy management. Logstash is a data collection, enrichment, and transportation pipeline. The ELK stack is completed by Kibana, which is a data visualization platform enabling interaction with data through stunning, powerful graphics.

In order to start your discovery of ELK stack, check out my book titled – Applied ELK Stack: Data Insights and Business Metrics with Collective Capability of ElasticSearch, Logstash and Kibana. With this book you will discover:

  • The need for log analytics, and current challenges
  • How to perform real-time data analytics on streaming data, and turn them into actionable insights
  • How to create indexing and delete data
  • The different components of ELK (Elasticsearch, Logstash, and Kibana) stack
  • Shipping, Filtering, and Parsing Events with Logstash
  • How to build amazing visualizations and dashboards using Data Discovery, Visualization, and Dashboard with Kibana

I hope this book is able to help you with log management along with providing business insights. Do let me know your valuable feedback on the book.

Microservices Asynchronous Communication

Communication among microservices can be either synchronous or asynchronous. Each mode has its own pros and cons. In this post, I would like to share the mechanism of asynchronous communication. As an example, lets consider MyKart, which is an online shopping portal. Just like a typical E-Commerce site, customers (users) can login, browse through product categories, select products and order after paying online. The service architecture is based on micro-service architecture and the application is being built on top of Java stack with the node.js/java script based UI.

Lets see how RabbitMQ can be used for asynchronous communication amongst the microservices.

Service Interaction

The interaction amongst the different services is depicted below:

SequenceDiagram

A typical use case for ordering products is:

  1. Customer (user) logs into the MyKart portal. This is facilitated by the Order service. For existing customers, the details are fetched from Customer service. For new users, a new customer account is created.
  2. Customer can scan through all the different categories of products. The Order service queries the Catalog service to get the details of all the products.
  3. Customer selects the products which he/she wants to purchase. The Billing service is used for online transaction. A payment gateway is used to pay for the purchased products.
  4. Order service maintains records of purchased products and notifies the Shipment service of the transaction. The shipment service will further inform logistics team to prepare for shipping of the purchased products.

Service Interaction and Communication

In order to help with scalability, bulk of the interaction amongst the services would be through asynchronous communication. This would be facilitated by RabbitMQ messaging system. It would help to mention here key concepts of RabbitMQ.

RabbitMQ Fundamentals

The three parts that route messages in RabbitMQ’s are exchanges, queues, and bindings. Applications direct messages to exchanges. Exchanges route those messages to queues based on bindings. Bindings tie queues to exchanges and define a key (or topic) which it binds with.

Exchange

Besides the exchange type, exchanges are declared with a number of attributes, the most important of which are:

  • Name
  • Durability (exchanges survive broker restart)
  • Auto-delete (exchange is deleted when all queues have finished using it)
  • Arguments (these are broker-dependent)

Exchanges can be durable or transient. Durable exchanges survive broker restart whereas transient exchanges do not. Different types of exchanges are as following:

  • Default Exchange – The default exchange is a direct exchange with no name (empty string) pre-declared by the broker. It has one special property that makes it very useful for simple applications: every queue that is created is automatically bound to it with a routing key which is the same as the queue name.
  • Direct Exchange – A direct exchange delivers messages to queues based on the message routing key. A direct exchange is ideal for the unicast routing of messages.
  • Fanout Exchange – A fanout exchange routes messages to all of the queues that are bound to it and the routing key is ignored. If N queues are bound to a fanout exchange, when a new message is published to that exchange a copy of the message is delivered to all N queues. Fanout exchanges are ideal for the broadcast routing of messages.
  • Topic Exchange – Topic exchanges route messages to one or many queues based on matching between a message routing key and the pattern that was used to bind a queue to an exchange. The topic exchange type is often used to implement various publish/subscribe pattern variations. Topic exchanges are commonly used for the multicast routing of messages.
  • Headers Exchange – A headers exchange is designed for routing on multiple attributes that are more easily expressed as message headers than a routing key. Headers exchanges ignore the routing key attribute. Instead, the attributes used for routing are taken from the headers attribute. A message is considered matching if the value of the header equals the value specified upon binding.

Queues

Queues store messages that are consumed by applications. The key properties of queues are as following:

  • Queue Name – Applications may pick queue names or ask the broker to generate a name for them.
  • Queue Durability – Durable queues are persisted to disk and thus survive broker restarts. Queues that are not durable are called transient. Not all scenarios and use cases mandate queues to be durable.
  • Bindings – These are rules that exchanges use (among other things) to route messages to queues. To instruct an exchange E to route messages to a queue Q, Q has to be bound to E. Bindings may have an optional routing key attribute used by some exchange types.

Message Attributes

Messages have a payload and attributes. Some of the key attributes are listed below:

  • Content type
  • Content encoding
  • Routing key
  • Delivery mode (persistent or not)
  • Message priority
  • Message publishing timestamp
  • Expiration period
  • Publisher application id

Message Acknowledgements – We know that networks can be unreliable and this might lead to failure of applications. So it is often necessary to have some kind of processing acknowledgement. In some cases it is only necessary to acknowledge the fact that a message has been received. In other cases acknowledgements mean that a message was validated and processed by a consumer.

Service Communication

All the services of MyKart would utilize RabbitMQ infrastructure for communicating with each other. This is depicted in the following diagram:

RabbitMQ-Comm

When different services want to implement request-response communication using RabbitMQ, there are two key patterns:

  • Exclusive reply queue per request – In this case each request creates a reply queue. The benefits are that it is simple to implement. There is no problem with correlating the response with the request, since each request has its own response consumer. If the connection between the client and the broker fails before a response is received, the broker will dispose of any remaining reply queues and the response message will be lost. Key consideration to take care is that we need to clean up any replies queues in the event that a problem with the server means that it never publishes the response. This pattern has a performance cost because a new queue and consumer has to be created for each request.
  • Exclusive reply queue per client – In this case each client connection maintains a reply queue which many requests can share. This avoids the performance cost of creating a queue and consumer per request, but adds the overhead that the client needs to keep track of the reply queue and match up responses with their respective requests. The standard way of doing this is with a correlation id that is copied by the server from the request to the response.

Durable reply queue – In both the above patterns, the response message can be lost if the connection between the client and broker goes down while the response is in flight. This is because they use exclusive queues that are deleted by the broker when the connection that owns them is closed. This can be avoided by using a non-exclusive reply queue. However this creates some management overhead. One needs some way to name the reply queue and associate it with a particular client. The problem is that it’s difficult for the client to know if any one reply queue belongs to itself, or to another instance. It’s easy to naively create a situation where responses are being delivered to the wrong instance of the client. In the worst case, we would have to manually create and name response queues, which remove one of the main benefits of choosing broker based messaging in the first place.

Rather than using the exclusive reply queue per request, I would recommend to use exclusive reply queue per client. Each service would have its own exchange of type direct where other services would send request events. For response events, the default exchange would be leveraged. All services would have a response queue bound to default exchange corresponding to each service from which response is expected.

Request – Response Event
The request and response messages would be send as an event structure. They would contain the following information:

  1. Correlation Id
    In order to co-relate response event with request event, unique request correlation id would be used. This would help in uniquely identifying a request and tying it up with response. One way of generating unique id is to leverage UUID class of Java. See the code snippet below:
    UUID.randomUUID().toString();
    There can be other mechanisms also for unique id generation, but in my view this is most suitable and user-friendly.
  2. Payload
    The request information and response details would be serialized as a JSON string. While sending a request, the request object details would be converted into JSON format. Similarly, while receiving response, the JSON information would be used to populate the actual response object. The third party Jackson utility would be leveraged for object to JSON conversion and vice-versa.

Service Interaction
Let us take an example to see how the different services would leverage the messaging infrastructure for interaction. Let us take the use case when a customer logs in to the Order service and it then fetches customer information.

  1. Order service needs to get customer information, like name, address, etc. and for this it has to query the Customer service.
  2. The request object contains the customer id. It is converted into JSON format using Jackson.
  3. Java UUID class is used to generate the correlation id.
  4. The message is send to Customer exchange with customer.read as the routing key. The CustomerReadQueue is bound to Customer exchange with this key and so the message gets delivered to CustomerReadQueue.
  5. The reply_to field is used to indicate on which particular queue the response event should be send. In this case, the reply_to field is filled with the value CustomerResponseQueue.
  6. Once the Customer service is able to fetch the desired information, it converts it into JSON format. It then populates the Correlation Id received earlier. The response event is then send to CustomerResponseQueue, which is bound to the default exchange.
  7. The Order service is waiting for response messages on CustomerResponseQueue. Once it receives an event it uses the JSON payload to populate the Response object. The Correlation Id is used to map it to the request object send earlier.

Service Notification
Besides, the regular request-response interaction, the messaging infrastructure can also be used for one way notifications. Any good application should always log important information and also raise metrics/alarms for the same. All the services of MyKart would send important notifications to Logging Service and Metrics service. The Logging service would use these events to log information in logging infrastructure. The Metrics service would use this information to raise metrics and alarms. These metrics are helpful in serviceability of microservices.

For MyKart, there is Fanout exchange by the name of Notification exchange. It would send the received information to all the registered queues. Both the Logging service and Metrics service have their queues bound to this exchange. Events for Logging service are received on LoggingQueue and the events for Metrics service are received on MetricsQueue.

Conclusion

As has been demonstrated in the shopping cart example, RabbitMQ can be used effectively for asynchronous communication among different services.

Java Garbage Collectors – Moving to Java7 Garbage-First (G1) Collector

One of the key strengths of JVM is automatic memory management (Garbage Collection). Its understanding can help in writing better applications. This becomes all the more important as enterprise server applications have large amount of live heap data and significant parallel threads. Until recently, main collectors were parallel collector and concurrent-mark-sweep (CMS) collector. This blog introduces the various Garbage Collectors and compares the CMS collector against its replacement, a new implementation in Java7 i.e. G1. It is characterized by a single contiguous heap which is split into same-sized regions. In fact if your application is still running on the 1.5 or 1.6 JVM, a compelling argument to upgrade to Java 7 is to leverage G1.

We all know that Java programming language is widely used in large server applications which are characterized by large amounts of live heap data and considerable thread-level parallelism. These applications are often run on high-end multiprocessors. Although, throughput is important for such applications, but they are often also sensitive to latency. It becomes important for telecommunication, call-processing applications where delays of even milliseconds in setting up calls can adversely affect the user experience. The Java virtual machine (JVM) specification mandates that any JVM implementation must include a garbage collector (GC) to reclaim unused memory (i.e., unreachable objects). However, the behavior and efficiency of a garbage collector can heavily influence the performance and responsiveness of any application that relies on it.

HotSpot JVM Architecture

The HotSpot JVM architecture supports a strong foundation of features and capabilities that help in realizing high performance and massive scalability. The main components of the JVM include the class loader, the runtime data areas, and the execution engine.

1-jvmcomponents

The three main components of JVM responsible for application performance are heap, JIT compiler and Garbage Collector. All the object data is stored in heap. This area is then managed by the garbage collector selected at startup. Most tuning options help in sizing the heap and choosing the most appropriate garbage collector. The JIT compiler also has a big impact on performance but rarely requires tuning with the newer versions of the JVM.

While tuning a Java application, the key factors to consider are Responsiveness, Throughput and Footprint.

Responsiveness – Indicates how quickly an application or system responds with a requested piece of data. For applications that intend to be responsive, large pause times are not desirable. The aim is to respond in short periods of time. Examples include:

  • How quickly a desktop UI renders pages.
  • How prompt a website is.
  • How fast can a database be accessed.

Throughput – Maximizing the amount of work by an application in a specific period of time. High pause times may be acceptable for applications that focus on throughput. Throughput may be measured by the following criteria:

  • Number of transactions completed in a given time.
  • Number of jobs executed by a batch program in an hour.
  • The number of database queries completed in given time.

Footprint – The amount of heap size occupied by an application.

Typically, for any given application tuning, two out of the above mentioned three factors are chosen and worked upon. For example, if high throughput with minimal footprint is required, then the application would have to compromise on responsiveness. This is so as in order to keep footprint small, frequent garbage collection would be required which may lead to pausing the application during garbage collection.

The Java HotSpot VM garbage collectors are based on Generational Hypothesis. It is based on following principles:

  • Most objects die young
    • Only a few live very long
    • Longer they live, more likely they live longer
  • Old objects rarely reference young objects

Most allocated objects will die young. Few references from older to younger objects exist.

These two observations are collectively known as the weak generational hypothesis, which generally holds true for Java applications. To take advantage of this hypothesis, the Java HotSpot VM splits the heap into three physical areas, as depicted in figure below:

2-vmstructure

Young Generation – This is the place where most new objects are allocated. It is typically small and collected frequently. Since most objects in young generation are expected to die quickly, the number of objects that survive a young generation collection (also referred to as a minor collection) is expected to be low. Minor collections tend to be very efficient as they concentrate on a space that is usually small and is likely to contain a lot of garbage objects. Young generation is further compartmentalized into an area called Eden plus two smaller survivor spaces. Most objects are initially allocated in Eden. The survivor spaces hold objects that have survived at least one young generation collection and have thus been given additional chances to die before being considered “old enough” to be promoted to the old generation. At any given time, one of the survivor spaces (labeled From in the figure) holds such objects, while the other is empty and remains unused until the next collection.

Old Generation – Objects that are too big to fit in young generation are allocated directly from old generation. Similarly, objects that are longer-lived are promoted (or tenured) to the old generation. The old generation is typically larger than the young generation and it gets occupied more slowly. This results in old generation collections (also referred to as major collections) to be infrequent but lengthy.

Permanent Generation – The Permanent generation contains JVM metadata which describes the classes and methods used in the application. It is populated by the JVM at runtime based on classes in use by the application. In addition, Java SE library classes and methods may be stored here.

Different garbage collection strategies are followed for different regions.

Serial Collector – The Serial Collector does collection for both young and old generation, serially. There is a stop-the-world pause during both minor and major collections. The application processing resumes once the collection is finished. The Serial Collector works fine for client side applications that do not have low pause time requirements.

Parallel Collector – These days, machines with a lot of physical memory and multiple CPUs is quite common. The parallel collector takes advantage of multiple CPUs rather than keeping most of them idle while only one does garbage collection work. It uses a parallel version of the young generation collection algorithm utilized by the serial collector. It is still a stop-the-world and copying collector, but performing the young generation collection in parallel leads to decrease in garbage collection overhead and increase in application throughput. Server side applications that run on machines with multiple CPUs and don’t have pause time constraints benefit from parallel collector.

Concurrent Mark-Sweep (CMS) Collector – In many cases, end-to-end throughput is not as important as fast response time. Young generation collections do not cause long pauses most of the times. However, old generation collections, though infrequent, can cause long pauses, especially when large heaps are involved. To address this issue, the HotSpot JVM includes a collector called the concurrent mark-sweep (CMS) collector, also known as the low-latency collector. It collects the young generation the same way the Parallel and Serial Collectors do. Its old generation, however, is collected concurrently along with application threads. This results in shorter pauses.

Let us now see how CMS does garbage collection for both the young and old generations.

Young Generation Collection by CMS Collector – CMS collects young generation using multiple threads just like Parallel Collector. Figure below illustrates a typical heap ready for young generation collection:

3cms1

The young generation comprises of one Eden and two Survivor spaces. The live objects in Eden are copied to the initially empty survivor space, labeled S1 in the figure, except for ones that are too large to fit comfortably in the S1 space. Such objects are directly copied to the old generation. The live objects in the occupied survivor space (labeled S0) that are still relatively young are also copied to the other survivor space, while objects that are relatively old are copied to the old generation. If the S1 space becomes full, the live objects from Eden or S0 that have not been copied to it are tenured, regardless of their age. Any objects remaining in Eden or the S0 space after live objects have been copied are not live and need not be examined. Figure below illustrates the heap after young generation collection:

3cms2

The young generation collection leads to stop the world pause. After collection, eden and one survivor space are empty. Now let’s see how CMS handles old generation collection. It essentially consists of two major steps – marking all live objects and sweeping them.

Old Generation Collection in CMS

3cms3

The marking is done in stages. At the beginning there is a short pause, called initial mark, which identifies the set of objects that are immediately reachable from outside the heap. This has a stop the world pause. Thereafter, during the concurrent marking phase, it marks all live objects that are transitively reachable from this set. Since the application is running and updating reference fields (hence, modifying the object graph) while the marking phase is taking place, not all live objects are guaranteed to be marked at the end of the concurrent marking phase. To care of this, the application stops again for a second pause, called remark, which finalizes marking by revisiting any objects that were modified during the concurrent marking phase. As the remark pause has a substantial stop the world pause, multiple threads are used to increase its efficiency. At the end of the remark phase, all live objects in the heap are guaranteed to have been marked. Since revisiting objects during the remark phase increases the amount of work the collector has to do, its overhead increases as well. This is a typical trade-off for most collectors that attempt to reduce pause times.

The heap structure after mark phase(s) can be seen below where the live objects are in light blue colored blocks.

3cms4

3cms5

After marking of all live objects in old generation is done, the concurrent sweeping happens which sweeps over the heap, de-allocating garbage objects in-place without relocating the live ones. As the figure illustrates, object marked with dark color are assumed to be garbage. After the sweeping phase, the dark colored objects are removed and only blue colored (live) objects remain. The free space is not contiguous and the collector needs to employ a data structure (free lists, in this case) that records which parts of the heap contain free space. As a result, allocation into the old generation is more expensive. This imposes extra overhead to minor collections, as most allocations in the old generation take place when objects are promoted during minor collections. Another disadvantage of CMS is that it typically has larger heap requirements. There are few reasons for this. First, a concurrent marking cycle lasts longer than that of a stop-the-world collector. And it is only during the sweeping phase that space is actually reclaimed. Given that the application is allowed to run during the marking phase, it is also allowed to allocate memory, hence the occupancy of the old generation potentially will grow during the marking phase and drop only during the sweeping phase. Additionally, despite the collector’s guarantee to identify all live objects during the marking phase, it doesn’t actually guarantee that it will identify all objects that are garbage. Some objects that will become garbage during the marking phase may or may not be reclaimed during the cycle. If they are not, then they will be reclaimed during the next cycle. Garbage objects that are wrongly identified as live are usually referred to as floating garbage.

The heap becomes fragmented due to the lack of compaction and it might also prevent the collector from using the available space as efficiently as possible.

Finally, it is very tedious to tune the CMS collector. There are lots of options and it takes a lot of experimentation to arrive the best configuration for a particular application.

Introducing Garbage-First (G1) Collector

In order to overcome the shortcomings of CMS and not comprise throughput, the Garbage-First (G1) Collector has been introduced. The G1 collector is a server-style garbage collector, targeted for multi-processors with large memories, that meets a soft real-time goal with high probability, while achieving high throughput. The G1 garbage collector is fully supported in Oracle JDK 7 update 4 and later releases. The G1 collector is suitable for applications that:

  • Can work concurrently along with application threads like CMS collector.
  • Compacts free space without lengthy GC induced pause times.
  • Require more predictable GC pause durations.
  • Want reasonable throughput performance.
  • Do not want a much larger Java heap.

G1 is supposed to be the long term replacement for the Concurrent Mark-Sweep Collector (CMS). G1 offers many benefits in comparison to CMS. First and foremost, G1 is a compacting collector. G1 compacts sufficiently which leads to elimination of potential fragmentation issues, to a large extent. Also, G1 offers more predictable garbage collection pauses than the CMS collector, and allows users to specify desired pause targets. The Refinement, Marking and Cleanup phases are concurrent, while the young generation collection is done using multiple threads in parallel. The full garbage collection continues to be single threaded but if tuned properly applications should avoid full GCs. Another major benefit is that G1 is easy to use and tune. When performing garbage collections, G1 operates in a manner similar to the CMS collector. G1 performs a concurrent global marking phase to determine the liveness of objects throughout the heap. After the completion of mark phase, G1 knows which regions are mostly empty and it collects in these regions first. This usually yields a large amount of free space. This is why this method of garbage collection is called Garbage-First.

G1 Heap Overview

3gone1

The heap in case of G1 is differently organized in comparison to earlier generational GCs. The heap is one large contiguous spaced partitioned into a set of equal-sized heap regions (approximately 2000 in number). The region size is chosen at startup and varies from 1 MB to 32 MB. There is no physical separation between young and old generation. A region may act as wither eden, survivor(s) or old generation. This provides a greater flexibility in memory usage. Objects are moved between regions during collections. For large objects (> 50% of region size), humongous regions are used. Currently, G1 is not optimized for collecting large objects in humongous regions.

Young Generation Collection in G1

4gone2

Live objects are evacuated (copied/moved) to one or more survivor regions. If the aging threshold is met, some of the objects are promoted to old generation regions. It involves a stop the world (STW) pause. It’s done in parallel with multiple threads to shorten the pause time. Eden size and survivor size is calculated for the next young GC. Pause time goal are taken into consideration. This approach makes it very easy to resize regions, making them bigger or smaller as needed.

Old Generation Collection in G1

4gone3

The old generation collection starts with the Initial Marking phase and it is piggybacked on young generation collection. It is a stop the world event and survivor regions (root regions) which may have references to objects in old generation are marked.

4gone4

In the Concurrent Marking phase liveness information per region is determined while the application is running. Live objects are found over the entire heap. This activity may get interrupted by young generation collections. Any empty regions found (denoted as X) is removed immediately in the Remark phase.

4gone5

The Remark phase completes the marking of live objects in the heap. G1 collector uses an algorithm called snapshot-at-the-beginning (SATB) which is much faster than what is used in the CMS collector. Empty regions are removed and reclaimed. Region liveness is now calculated for all regions and this is a stop-the-world event.

4gone6

In the Copying/Cleanup phase, G1 selects the regions with the low “liveness”. These regions can be collected fastest and this cleanup happens at the same time as a young GC. So both young and old generations are collected at the same time.

4gone7

After the cleanup phase, selected regions are collected and compacted. This is represented in dark blue region and the dark green region shown in the figure. Some garbage objects may be left in old generation regions and they may be collected later based on future liveness, pause time target and number of unused regions.

G1 Old Generation GC Summary

  • Concurrent Marking Phase
    • Calculates liveness information per region, concurrently while the application is running
    • Identifies best regions for subsequent evacuation phases
    • No corresponding sweeping phase
  • Remark Phase
    • Different marking algorithm than CMS
    • Uses Snapshot-at-the-beginning (SATB) which is much faster than what was being used in CMS
    • Copying/Cleanup Phase
  • Completely empty regions are reclaimed
    • Young generation and Old generation reclaimed at the same time
    • Old generation regions selected based on their liveness

G1 and CMS Comparison

Features G1 GC CMS GC
Concurrent and Generational Yes Yes
Releases Maximum Heap memory after usage Yes No
Low Latency Yes Yes
Throughput Higher Lower
Compaction Yes No
Predictability More Less
Physical Separation between Young and Old No Yes

Footprint Overhead

For the same application size, as compared to CMS, the heap size is likely to be larger in G1 due to additional accounting data structures

Remembered Sets (RSets / RSet) – The RSets track object references into a given region and there is one RSet per region. This enables parallel and independent collection of a region as there is no need to track whole heap to find references. Footprint overhead due to RSets is less than 5%. More inter-region references can lead to bigger Remembered Set which in turn leads to a slow GC.

Collection Sets (CSets / CSet) – The CSet is set of regions that will be collected in a GC cycle. Regions can be eden and survivor, and optionally after (concurrent) marking some old generation regions. All live data in a CSet is evacuated (copied/moved) during the GC. It has a footprint overhead less than 1%.

Command Line Options

To start using the G1 Garbage Collector use the following option:

-XX:+UseG1GC

To set target for the maximum GC pause time, use the following option:

-XX:MaxGCPauseMillis=200

Tuning Options

The main goal of G1 GC is to reduce latency. If latency is not a problem, then Parallel GC can be used. A Related goal is simplified tuning. The most important tuning option is XX:MaxGCPauseMillis=200 (default value = 200ms). It Influences maximum amount of work per collection. However, this is Best effort only.

A trigger to start GC can be given by the option -XX:InitiatingHeapOccupancyPercentage=n. It specifies percent of entire heap not just old generation. Automatic resizing of young generation has lower and upper bound of 20% and 80% of java heap, respectively. One should be cautious while using option as too low value can lead to unnecessary GC overhead and too high value can lead to Space Overflow.

Threshold for region to be included in a Collection Set can be specified by -XX:G1OldCSetRegionLiveThresholdPercent=n. One should be cautious while using option as too high value can lead to more aggressive collecting and too low value can lead to heap wastage.

The Mixed GC / Concurrent Cycle can be specified using -XX:G1MixedGCCountTarget=n. A too high value can lead to unnecessary overhead and a too low can lead to longer pauses.

Care must be taken if young generation size is fixed using the option –Xmn. This can cause PauseTimeTarget to be ignored. G1 no longer respects the pause time target. Even if heap expands, the young generation size is fixed.

Sample Application Test

Let’s create a sample application to to measure performance and behavior of CMS and G1 collectors. The basic algorithm is described below:

  • Create and add 190 Float Arrays into an Array List
  • Each Float Array reserves 4MB of memory, i.e. 1 x 1024 x 1024 = 4 MB
  • 4 MB x 190 = 760 MB
  • After each iteration the arrays are released and application sleeps for some time
  • Same steps are repeated certain number of times

I have run this application on a Windows 7 machine and VisualVM is used to analyze GC logs.

CMS Collector Results

Command Line Arguments to test CMS Collector:

java -server -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:CMS.log -Dcom.sun.management.jmxremote.port=3333 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -classpath C: GCTest 190

Observations with VisualVM

5cmsvm

G1 Collector Results

Command Line Arguments to test G1 Collector:

java -server -XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:G1GC.log -Dcom.sun.management.jmxremote.port=3333 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -classpath C: GCTest 190

Observations with VisualVM

5g1vm

Results Comparison

Parameters G1 GC CMS GC
Time Taken for Execution 7 min 5 sec 7 min 56 sec
Max CPU Usage 27.3% 70.2%
Max GC Activity 2% 24%
Max Heap Size 974 MB 974 MB
Max Used Heap Size 763 MB 779 MB
  • G1 GC is able to reclaim max heap size
    • CMS is not able to do so
  • Lesser CPU utilization for G1 collection
  • G1 Heap goes to max size in three distinct jumps
    • CMS seems to gain max heap size in initial jump

Should You Move to G1 GC

Now that we have covered all the good things about G1 GC, the most pertinent question is in which cases it should be used. I would suggest exercising cautious optimism. Don’t rush blindly to embrace it as it also has some costs (high heap size) and may actually not be a good fit for certain scenarios. Evaluate all other options before moving to G1 GC.

If you don’t need low latency then you are better off using parallel GC. If you don’t need a big heap, then use a small heap and parallel GC. If you need a big heap, then first try CMS collector. If CMS is not performing well, then try to tune it. If all attempts at tuning CMS are not paying dividends then you can consider using G1 GC.