Skip to content

Deep Dive into Node.js with James Snell

This article was written over 18 months ago and may contain information that is out of date. Some content may be relevant but please refer to the relevant official documentation or available resources for the latest information.

Node.js is one of the most used engines globally, and it is unique in the approach it follows to make our code work. As developers, we usually ignore how the underlying tools we use in daily work.

In this article, we will deep dive into the Node.js internals using James Snell's talk as our guide, but we will expand in some areas to clarify some of the concepts discussed.

We will learn about the Event Loop and the asynchronous model of Node.js. We will understand Event Emitters and how they power almost everything inside Node, and then we will build on that knowledge to understand what Streams are, and how they work.

Event Loop

We will begin our journey inside Node.js by exploring the Event Loop. The Event Loop is one of the most critical aspects of Node.js to understand. The Event Loop is the big orchestrator- the mechanism in charge of scheduling Node.js' synchronous and asynchronous nature. This section teaches how everything related to the scheduling mechanism works.

When the Node.js process begins, it starts multiple threads: the Main process thread and the Libuv pool threads with four worker threads by default. The Libuv thread concern is handling heavy load work like IO by reading files from the disc, running some encryption, or reading from a socket.

Node.js Process high level diagram.

We can see the Event Loop as a glorified for/while loop that lives inside the main thread. The loop process happens at the C++ level. The Event Loop must perform several steps to complete a full back-around iteration. These steps will involve performing different checks and listening to OS events, like checking if there are any timers expired that need to be fired. It will check if there is any pending IO to process or schedule.

The event loop steps using Libuv threads

These tasks run at the C++ level, but the event associated with the steps usually involves a callback like the action to execute when a timer expires, or a new chunk of a file is ready to be processed. When this happens, the callback executes in Javascript. Because the event loop exists in the process main thread, every time one of the steps of the event loop is processed, the event loop and the main thread are blocked until that step completes. This means that even if the Libuv is still executing the IO operations while the main thread is completing one-step tasks, the result of the IO operations is not accessible until the main thread is unblocked and iterates to the step that handles that result.

"There is no such thing as asynchronous Javascript" James Snell

But if asynchronous doesn't exist in Node and Javascript. What are promises for?

Promises are just a mechanism to defer processing to the future. The power of Promises is the ability to schedule code execution after a specific set of events happens. But we will learn more about Promises later in this article.

Step Execution

Now that we understand the event loop high-level picture, it is time to go one level deeper and understand what happens in each step of the Event Loop.

We learned that each step of the event loop executes in C++, but what happens if that step needs to run some Javascript?

The C++ step code will use the V8 APIs to execute the passed Javascript code in a synchronous call. That Javascript code could have calls to a Node API named process.nextTick(fn), which receives another function as an argument.

"nextTick is not a good name because no tick is involved." James Snell

If present, the process.nextTick(fn) appends its argument function in queue data structure. Later, we will find that there is a second queue where the Javascript code can append functions. But for simplicity, let's assume for now that there is only one queue: the nextTick queue. Once the Javascript runs and completes filling the queue through the process.nextTick method, it will return control to the C++ layer.

Now it is time for Node to drain the nextTick Queue synchronously, in the order they were added, by running each of the functions added to the nextTick queue. Only when the queue is empty can the event loop move to the next step and start again with the same process.

Event loop processing Javascript in action. Simplified version

Remember everything described before runs asynchronously.

"Any time you have Javascript running, anything else is happening" James Snell

Therefore, the key to keeping your Javascript performant is to keep your functions small, and use a scheduling mechanism to defer work.

But what is the scheduling mechanism?

The scheduling mechanisms are the instruments through which Node.js can simulate asynchronicity by scheduling the execution of a given Javascript function to a given time in the future. The scheduling mechanisms are the nextTick queue and the Microtask queue. The difference between these two is the order in which they execute. NodeJS will only start draining the Microtasks queue after the nextTick queue is empty. And the nextTick queue after the call stack is empty.

The call stack is a LIFO (Last In, First Out) stack data structure. The operations from the stack are completely synchronous and are scheduled to run ASAP. The v8 API that we saw before runs the Javascript code sent from the C++ layer by adding and removing the statements into the stack, and executing them as corresponding.

We saw how the nextTick queue is filled by the V8 execution when processing one statement from the stack, and how it drains as soon as C++ processes the Javascript stack.

The Microtask queue is the one that process Promises continuation like events, catches, and finallies in V8. It is drained immediately after the nextTick queue is empty.

Event Loop high level view.

Let's paint a picture of how this works. The following represents a function body executing several tasks.

  // Some JS Code
  // Block A ... this could be code like var c = a + b, etc
  promise.then(fn)
  // Block B ... this could be code like var c = a + b, etc
  nextTick(fn)
  // Block C ... this could be code like var c = a + b, etc

But the final order in which Node will execute this code looks more like

  // Some JS Code
  // Block A ... this could be code like var c = a + b, etc  
  // Block B ... this could be code like var c = a + b, etc  
  // Block C ... this could be code like var c = a + b, etc
  nextTick(fn)
  promise.then(fn)

Notice how nextTick is processed after the stack of operations is emptied, and the Microstask operations (the promise continuations) are deferred to just after the nextTick queue is emptied.

However, there are other scheduling mechanisms:

  • Timers like setTimeout, setInterval, etc
  • Inmediates, the execution order of which is combined with timers, but they are not timers. A setInmediate(fn) registers a callback that will execute at the end of the current event loop turn, or just before the next event loop turn starts.
  • Promises, which are not the same as promises event continuation, which is what the Microstask handles.
  • Callbacks on the native layers this is something that is not necessarily available to the javascript developer.
  • Workers threads are separate node instances with their own process main thread, their own Libuv threads, and their own event loop. It is basically a complete Node instance running in a separate thread. Because it is a separate Node instance, it can run Javascript Node in parallel to the main Node instance.

"NextTick and Immediate names should be inverted because NextTick operations happen immediately after the Javascript stack is empty and Immediate functions will happen just before the nextTick start."

To find complementary resources, you can go to the Node.js documentation.

Event Emitter

Event emitters are one of the first Node APIs ever created. Almost everything in Node is an event emitter. It is the first thing that loads and is fundamental to how Node works.

If you see an object with the following methods, you have an event emitter.

on('eventName', fn)
once('eventName', fn)
emit('eventName', arg...)

But how do these work?

Inside the event emitter object, we have a map/look-up table, which is just another object. Inside each map entry, the key is the event name, and the value is an array of callback functions.

The on method on the event emitter object receives the event name as the first argument, and a callback function as its second. When called, this method adds the on callback function into the corresponding array of the look-up table.

The once method behaves like the on method, but with a key difference; instead of storing the once callback function directly, it will wrap it inside another function, and then add the wrapped function to the array. When the wrapper function gets executed, it will execute the wrapped callback function, removing itself from the array.

On the other hand, emit uses the event name to access the corresponding array. It will create a backup copy of the array, and iterate synchronously over each callback function of the array executing it.

Event Emitters

It is important to highlight that emit is synchronous. If any of the callback functions on the array take a long time to execute, the main thread will be blocked. Emit won't return until all of functions on the array have executed.

The asynchronous behavior of the emit is an illusion. This effect is caused because internally, Node will invoke each callback function using a nextTick, and therefore deferring the function's execution into the future.

You can find more info about Event Emitter at the Node.js documentation.

Streams (Node | Web)

Streams are one of the fundamental concepts that power Node.js applications. They are a way to handle reading/writing files, network communications, or any end-to-end information exchange efficiently.

Streams are not a concept unique to Node.js. They were introduced in the Unix operating system decades ago, and programs can interact with each other by passing streams through the pipe operator (|).

For example, traditionally, when you tell the program to read a file, it is read into memory, from start to finish, and then you process it.

You read it piece by piece using streams, processing its content without keeping it all in memory. The Node.js stream module provides the foundation upon which all streaming APIs are built. All streams are instances of EventEmitter.

Node Streams

There are four Node stream types: Readable, Writable, Duplex, and Transform. All of them are Event Emitters.

Readable: a stream you can pipe from but not pipe into (you can receive data, but not send data to it). When you push data into a readable stream, it is buffered until a consumer starts to read the data.

Writable: a stream you can pipe into but not pipe from (you can send data but not receive from it).

Duplex: a stream you can both pipe into and pipe from. Basically a combination of a Readable and Writable stream.

Transform: a Transform stream is similar to a Duplex, but the output is a transform of its input.

Readable Stream

The Readable stream works through a queue and the highwatermark in an oversimplified way. The highwatermark will delimit how much data can be in the queue, but the Readable Stream will not enforce that limit.

Every time data is pushed into the queue, the stream will give feedback to the client code, telling if the highwatermark was reached or not, and transferring the responsibility to the client. The client code must decide if it continues pushing data and overflowing the queue.

This feedback is received through the push method, which is the mechanism for pushing data into the queue. If the push method returns true, the highwatermark has not been reached, and you can push more. If the push method returns false that means that the highwatermark has been reached, and you are not supposed to push more, but you can.

The readable stream
Events

When data has been pushed into the queue, the internal of the Readable Stream will emit a couple of events.

The on:readable is part of the pull model; it alerts that this stream is readable and has data to be read. If you are listening to the on:readable event, a read method can be called. When the client code calls the read() method, it will get a chunk of data out of the queue, and it will dequeue it. Then, the client code can keep calling read until there is no more data in the queue. After emptying the queue, the client code can restart pulling data when the on:readable event triggers again.

The on:read event is part of the push model. In this event, any new chunk of data that is pushed using the push method will be synchronously sent to its listeners. That means we don't need to call the read method. The data will arrive automatically. However, there is another crucial difference; the sent data will never be stored in the queue. The queue is only filled when there is no active on:read event listener. This is called the "flow" mode because the data just flows, and it doesn't get stored.

Other events are the end event that would notice that there is no more data to be consumed from the stream.

The 'close' event is emitted when the stream and any underlying resources (a file descriptor, for example) have been closed. The event indicates that no more events will be emitted, and no further computation will occur.

The 'error' event may be emitted by a Readable implementation. Typically, this may occur if the underlying stream cannot generate data due to an underlying internal failure or when a stream implementation attempts to push an invalid chunk of data.

The key to understanding the Readable Streams and their events is that the Readable Streams are just Event Emitters. Like Event Emitters, the Readable Streams don't have anything built into it that is asynchronous. It will not call any of the scheduling mechanisms. Therefore they operate purely synchronously, and to obtain an asynchronous behavior, the client code needs to defer when to call the different event APIs like the read() and push() methods. Let's understand how!

When creating a Readable Stream, we need to provide a read function. The stream will repeatedly call this read function as long as its buffer is not full; however, after it calls it once, it will wait until we call push before calling it again. If we synchronously call push after the Readable Stream called read and the buffer is not full, the stream will synchronously call read again, up until we call push(null), marking the end of the stream. Otherwise, we can defer the push calls to some other point in time, effectively making the Stream read calls operate asynchronously; we might, for example, wait for some File IO callback to return.

Examples

Creating a Readable Stream

const Stream = require('stream');

// Create the Readable Stream
const readableStream = new Stream.Readable();

// Implement the read method
readableStream._read = () => {};
const Stream = require('stream');

// Create the Readable Stream with the read method inlined.
const readableStream = new Stream.Readable({
  read() {},
});

Using a Readable Stream

// Synchronous
readableStream.push('hi!');
readableStream.push('ho!');

// Asynchronous
process.nextTick(() => {
  readableStream.push('hi!');
  readableStream.push('ho!');
})

// Asynchronous
somePromise.then(() => {
  readableStream.push('hi!');
  readableStream.push('ho!');
})

Writable Stream

The Writable Streams works similarly. When creating a Writable Stream, we need to provide a _write(chunk, encoding, callback) function, and it will be an external write(chunck) function. When the external write function is called, it will just call the internal _write function.

The internal _write function must call its callback argument function. Let's imagine that we write ten chunks; when the first chunk is written, if the internal _write function doesn't invoke the callback function, what will happen is that the chunks will accumulate in the Writable Stream internal buffer. When the callback is invoked, it will grab the next chunk and write it, draining the buffer. This means that if the _write is calling the callback function synchronously, then all those writes will happen synchronously; if it is deferring invoking that callback, then all those calls will happen asynchronously, but the data will accumulate in the internal buffer up until the highwatermark is hit. Even then, you can decide to keep incrementation the buffer queue.

Events

Like the Readable Stream, the Writable Stream is an Event Emitter that will provide its events listeners with useful alerts.

The on:drain event notifies that the Writable buffer is longer, and more writes are allowed. If a call to the write function returns false, indicating backpressure, the 'drain' event will be emitted when it is appropriate to resume writing data to the stream.

The 'close' event is emitted when the stream and any underlying resources (a file descriptor, for example) have been closed. The event indicates that no more events will be emitted, and no further computation will occur.

The 'error' event is emitted if an error occurs while writing or piping data. The listener callback is passed a single Error argument when called.

The 'finish' event is emitted after the stream.end() method has been called, and all data has been flushed to the underlying system.

Examples

Creating a Writable Stream

const Stream = require('stream');

// Create the Readable Stream
const writableStream = new Stream.Writable();

// Implement the _write_ method synchronously
writableStream._write = (chunk, encoding, callback) => {
  console.log(chunk.toString());
  // The callback is invoke synchronously
  callback();
};

// Implement the _write method
writableStream._write = (chunk, encoding, callback) => {
  console.log(chunk.toString());
  // The callback is invoke synchronously
  callback();
};

Duplex Stream

Duplex streams are streams that implement both the Readable and Writable interfaces. We can see a Duplex stream as an object containing both a Readable Stream and a Writable Stream. These two internal objects are not connected, and each has independent buffers.

They share some events; for example, when the close event emits, both Streams are closed. They still emit the stream type-specific events like the readable and read for the Readable Stream, or the drain for the Writable Stream independently.

However, this doesn't mean that data will be available on the Readable stream if we write into the Writable Stream. Those are independent.

Duplex streams are especially useful for Sockets.

Duplex Stream

Transform Stream

Transform streams are Duplex streams where the output is related to the input. Contrary to the Duplex streams, when we write some chunk of data into the Writable stream, that data will be passed to the Readable stream by calling a transform function and invoking the push method on the readable side with the transformed data. As with Duplex, both the Readable and the Writable sides will have independent buffers with their own highwatermark.

Examples

Creating a Transform Stream

const { Transform } = require('stream');

// Create the Transform stream
const transformStream = new Transform();

// Implement the _transform method
transformStream._transform = (chunk, encoding, callback) => {
  transformStream.push(chunk.toString().toUpperCase());
  callback();
};

Web Streams

In Web streams, we still see some of the concepts we have seen for Node Streams. There exist the Readable, Writable, Duplex, and Transform streams. However, they also have significant differences. Unlike Node streams, web streams are not Event Emitter-based but Promise-based. Another big difference is that Node streams support multiple event listeners at the same time, and each of those listeners will get a copy of the data, while Web streams are restricted to a single listener at a time, and it has no events in it; it is purely Promise-based.

While Node streams work entirely synchronously because Web streams are Promise-based, they work entirely asynchronously. This happens because the continuation is always deferred to the MicroTasks queue.

It is essential to highlight that Node streams are significantly faster than Web streams, while Web streams are much more portable than Node streams. So keep that in mind when you are planning to select one or the other.

Readable stream

To understand the underlying implementation of how this works, let's take the Readable stream and dissect it.

The developer can pass the underlayingSource object to the constructor when creating a new Readable stream. We can define the pull(controller) function inside this object, which receives the Controller object as its first and only parameter. This is the function where the client code can push the data into the stream by defining a custom data pulling logic. The Controller object, argument of the pull() function, contains the enqueue() function. This is the equivalent to the Node Readable stream push() function, and it is used to push the callback function return data into the Readable stream.

The reader object, is the element through which the Web Readable stream enforces a single listener or reader at the time. It can be accessed through the getReader() method of the stream. Inside the reader object, we can find the read() function, which returns a promise providing access to the next chunk in the stream's internal queue.

When the client code of the stream calls read() from the reader object, this will look into the Controller data queue and check if there is any data. If there is data in the queue, it will only dequeue the first chunk of data from the queue and resolve the read promise. If there is no data in the queue, the stream is going to call the pull(controller) function defined in the stream constructor argument object. Then the pull(controller) function will run and as part of its execution, it will call the Controller function enqueue() to push data into the stream. After pushing the data in, the pull function will resolve the initial read() Promise with the data pushed in.

Because the pull(controller) function can be an async function, and return a Promise, if it calls once and that Promise is not resolved yet, and the client code continues calling the read() function, those read promises are going to accumulate in a read queue. Now we have our data queue and the read queue, and every time a new data is enqueued into the data queue, the stream logic will check if there is some pending read promise in the read queue. If the read queue is not empty, the first pending Promise will be dequeued and resolved with the data enqueued in the data queue. We can see this process as the read queue and the data queue trying to balance each other out. Consequently, the read queue will not accumulate read promises if the data queue has data. And the opposite is also true; we will not accumulate data in the data queue if the read queue has unresolved promises.

Example
Creating a readable stream
const readableStream = new ReadableStream({  
start(controller) {  
/* โ€ฆ */  
},  

// Our pull function
pull(controller) {  
/* Get some data */  
controller.enqueue(someData);
/* โ€ฆ */  
},  

cancel(reason) {  
/* โ€ฆ */  
},  
});

Conclusions

There is immense beauty in the depths for those curious developers who are not afraid of diving into the deep waters of the Node.js inner mechanics.

In this article, we barely start exploring some of the solutions chosen by the Node.js core contributors, and we have found how elegant and straightforward many of those solutions are.

There is still much to learn about the internals of Node and the Browser engines, but I think this article and its companion Javascript Marathon are a great starting point.

If you get here, thank you, and I hope you got inspired to continue digging into the tools that allow us to unleash our passion and creativity.

This Dot is a consultancy dedicated to guiding companies through their modernization and digital transformation journeys. Specializing in replatforming, modernizing, and launching new initiatives, we stand out by taking true ownership of your engineering projects.

We love helping teams with projects that have missed their deadlines or helping keep your strategic digital initiatives on course. Check out our case studies and our clients that trust us with their engineering.

You might also like

Building a Multi-Response Streaming API with Node.js, Express, and React cover image

Building a Multi-Response Streaming API with Node.js, Express, and React

Introduction As web applications become increasingly complex and data-driven, efficient and effective data transfer methods become critically important. A streaming API that can send multiple responses to a single request can be a powerful tool for handling large amounts of data or for delivering real-time updates. In this article, we will guide you through the process of creating such an API. We will use video streaming as an illustrative example. With their large file sizes and the need for flexible, on-demand delivery, videos present a fitting scenario for showcasing the power of multi-response streaming APIs. The backend will be built with Node.js and Express, utilizing HTTP range requests to facilitate efficient data delivery in chunks. Next, we'll build a React front-end to interact with our streaming API. This front-end will handle both the display of the streamed video content and its download, offering users real-time progress updates. By the end of this walkthrough, you will have a working example of a multi-response streaming API, and you will be able to apply the principles learned to a wide array of use cases beyond video streaming. Let's jump right into it! Hands-On Implementing the Streaming API in Express In this section, we will dive into the server-side implementation, specifically our Node.js and Express application. We'll be implementing an API endpoint to deliver video content in a streaming fashion. Assuming you have already set up your Express server with TypeScript, we first need to define our video-serving route. We'll create a GET endpoint that, when hit, will stream a video file back to the client. Please make sure to install cors for handling cross-origin requests, dotenv for loading environment variables, and throttle for controlling the rate of data transfer. You can install these with the following command: ` ` In the code snippet above, we are implementing a basic video streaming server that responds to HTTP range requests. Here's a brief overview of the key parts: 1. File and Range Setup: We start by determining the path to the video file and getting the file size. We also grab the range header from the request, which contains the range of bytes the client is requesting. 2. Range Requests Handling: If a range is provided, we extract the start and end bytes from the range header, then create a read stream for that specific range. This allows us to stream a portion of the file rather than the entire thing. 3. Response Headers: We then set up our response headers. In the case of a range request, we send back a '206 Partial Content' status along with information about the byte range and total file size. For non-range requests, we simply send back the total file size and the file type. 4. Data Streaming: Finally, we pipe the read stream directly to the response. This step is where the video data actually gets sent back to the client. The use of pipe() here automatically handles backpressure, ensuring that data isn't read faster than it can be sent to the client. With this setup in place, our streaming server is capable of efficiently delivering large video files to the client in small chunks, providing a smoother user experience. Implementing the Download API in Express Now, let's add another endpoint to our Express application, which will provide more granular control over the data transfer process. We'll set up a GET endpoint for '/download', and within this endpoint, we'll handle streaming the video file to the client for download. ` This endpoint has a similar setup to the video streaming endpoint, but it comes with a few key differences: 1. Response Headers: Here, we include a 'Content-Disposition' header with an 'attachment' directive. This header tells the browser to present the file as a downloadable file named 'video.mp4'. 2. Throttling: We use the 'throttle' package to limit the data transfer rate. Throttling can be useful for simulating lower-speed connections during testing, or for preventing your server from getting overwhelmed by data transfer operations. 3. Data Writing: Instead of directly piping the read stream to the response, we attach 'data' and 'end' event listeners to the throttled stream. On the 'data' event, we manually write each chunk of data to the response, and on the 'end' event, we close the response. This implementation provides a more hands-on way to control the data transfer process. It allows for the addition of custom logic to handle events like pausing and resuming the data transfer, adding custom transformations to the data stream, or handling errors during transfer. Utilizing the APIs: A React Application Now that we have a server-side setup for video streaming and downloading, let's put these APIs into action within a client-side React application. Note that we'll be using Tailwind CSS for quick, utility-based styling in our components. Our React application will consist of a video player that uses the video streaming API, a download button to trigger the download API, and a progress bar to show the real-time download progress. First, let's define the Video Player component that will play the streamed video: ` In the above VideoPlayer component, we're using an HTML5 video tag to handle video playback. The src attribute of the source tag is set to the video endpoint of our Express server. When this component is rendered, it sends a request to our video API and starts streaming the video in response to the range requests that the browser automatically makes. Next, let's create the DownloadButton component that will handle the video download and display the download progress: ` In this DownloadButton component, when the download button is clicked, it sends a fetch request to our download API. It then uses a while loop to continually read chunks of data from the response as they arrive, updating the download progress until the download is complete. This is an example of more controlled handling of multi-response APIs where we are not just directly piping the data, but instead, processing it and manually sending it as a downloadable file. Bringing It All Together Let's now integrate these components into our main application component. ` In this simple App component, we've included our VideoPlayer and DownloadButton components. It places the video player and download button on the screen in a neat, centered layout thanks to Tailwind CSS. Here is a summary of how our system operates: - The video player makes a request to our Express server as soon as it is rendered in the React application. Our server handles this request, reading the video file and sending back the appropriate chunks as per the range requested by the browser. This results in the video being streamed in our player. - When the download button is clicked, a fetch request is sent to our server's download API. This time, the server reads the file, but instead of just piping the data to the response, it controls the data sending process. It sends chunks of data and also logs the sent chunks for monitoring purposes. The React application collects these chunks and concatenates them, displaying the download progress in real-time. When all chunks are received, it compiles them into a Blob and triggers a download in the browser. This setup allows us to build a full-featured video streaming and downloading application with fine control over the data transmission process. To see this system in action, you can check out this video demo. Conclusion While the focus of this article was on video streaming and downloading, the principles we discussed here extend beyond just media files. The pattern of responding to HTTP range requests is common in various data-heavy applications, and understanding it can be a useful tool in your web development arsenal. Finally, remember that the code shown in this article is just a simple example to demonstrate the concepts. In a real-world application, you would want to add proper error handling, validation, and possibly some form of access control depending on your use case. I hope this article helps you in your journey as a developer. Building something yourself is the best way to learn, so don't hesitate to get your hands dirty and start coding!...

How to automatically deploy your full-stack JavaScript app with AWS CodePipeline cover image

How to automatically deploy your full-stack JavaScript app with AWS CodePipeline

How to automatically deploy your full-stack JavaScript app from an NX monorepo with AWS CodePipeline In our previous blog post (How to host a full-stack JavaScript app with AWS CloudFront and Elastic Beanstalk) we set up a horizontally scalable deployment for our full-stack javascript app. In this article, we would like to show you how to set up AWS CodePipeline to automatically deploy changes to the application. APP Structure Our application is a simple front-end with an API back-end set up in an NX monorepo. The production built API code is hosted in Elastic Beanstalk, while the front-end is stored in S3 and hosted through CloudFront. Whenever we are ready to make a new release, we want to be able to deploy the new API and front-end versions to the existing distribution. In this article, we will set up a CodePipeline to deploy changes to the main branch of our connected repository. CodePipeline CodeBuild and the buildspec file First and foremost, we should set up the build job that will run the deploy logic. For this, we are going to need to use CodeBuild. Let's go into our repository and set up a build-and-deploy.buildspec.yml file. We put this file under the tools/aws/ folder. ` This buildspec file does not do much so far, we are going to extend it. In the installation phase, it will run npm ci to install the dependencies and in the build phase, we are going to run the build command using the ENVIRONMENT_TARGET variable. This is useful, because if you have more environments, like development and staging you can have different configurations and builds for those and still use the same buildspec file. Let's go to the Codebuild page in our AWS console and create a build project. Add a descriptive name, such as your-appp-build-and-deploy. Please provide a meaningful description for your future self. For this example, we are going to restrict the number of concurrent builds to 1. The next step is to set up the source for this job, so we can keep the buildspec file in the repository and make sure this job uses the steps declared in the yaml file. We use an access token that allows us to connect to GitHub. Here you can read more on setting up a GitHub connection with an access token. You can also connect with Oauth, or use an entirely different Git provider. We set our provider to GitHub and provided the repository URL. We also set the Git clone depth to 1, because that makes checking out the repo faster. In the Environment section, we recommend using an AWS CodeBuild managed image. We use the Ubuntu Standard runtime with the aws/codebuild/standard:7.0 version. This version uses Node 18. We want to always use the latest image version for this runtime and as the Environment type we are good with Linux EC2. We don't need elevated privileges, because we won't build docker images, but we do want to create a new service role. In the Buildspec section select Use a buildspec file and give the path from your repository root as the Buildspec name. For our example, it is tools/aws/build-and-deploy.buildspec.yml. We leave the Batch configuration and the Artifacts sections as they are and in the Logs section we select how we want the logs to work. For this example, to reduce cost, we are going to use S3 logs and save the build logs in the aws-codebuild-build-logs bucket that we created for this purpose. We are finished, so let's create the build project. CodePipeline setup To set up automated deployment, we need to create a CodePipeline. Click on Create pipeline and give it a name. We also want a new service role to be created for this pipeline. Next, we should set up the source stage. As the source provider, we need to use GitHub (version2) and set up a connection. You can read about how to do it here. After the connection is set up, select your repository and the branch you want to deploy from. We also want to start the pipeline if the source code changes. For the sake of simplicity, we want to have the Output artefact format as CodePipeline default. At the Build stage, we select AWS CodeBuild as the build provider and let's select the build that we created above. Remember that we have the ENVIRONMENT_TARGET as a variable used in our build, so let's add it to this stage with the Plaintext value prod. This way the build will run the build:prod command from our package.json. As the Build type we want Single build. We can skip the deployment stage because we are going to set up deployment in our build job. Review our build pipeline and create it. After it is created, it will run for the first time. At this time it will not deploy anything but it should run successfully. Deployment prerequisites To be able to deploy to S3 and Elastic Beanstalk, we need our CodeBuild job to be able to interact with those services. When we created the build, we created a service role for it. In this example, the service role is codebuild-aws-test-build-and-deploy-service-role. Let's go to the IAM page in the console and open the Roles page. Search for our codebuild role and let's add permissions to it. Click the Add permissions button and select Attach policies. We need two AWS-managed policies to be added to this service role. The AdministratorAccess-AWSElasticBeanstalk will allow us to deploy the API and the AmazonS3FullAccess will allow us to deploy the front-end. The CloudFrontFullAccess will allow us to invalidate the caches so CloudFront will send the new front-end files after the deployment is ready. Deployment Upload the front-end to S3 Uploading the front-end should be pretty straightforward. We use an AWS CodeBuild managed image in our pipeline, therefore, we have access to the aws command. Let's update our buildspec file with the following changes: ` First, we upload the fresh front-end build to the S3 bucket, and then we invalidate the caches for the index.html file, so CloudFront will immediately serve the changes. If you have more static files in your app, you might need to invalidate caches for those as well. Before we push the above changes up, we need to update the environment variables in our CodePipeline. To do this open the pipeline and click on the Edit button. This will then enable us to edit the Build stage. Edit the build step by clicking on the edit button. On this screen, we add the new environment variables. For this example, it is aws-hosting-prod as Plaintext for the FRONT_END_BUCKET and E3FV1Q1P98H4EZ as Plaintext for the CLOUDFRONT_DISTRIBUTION_ID Now if we add changes to our index.html file, for example, change the button to HELLO 2, commit it and push it. It gets deployed. Deploying the API to Elastic Beanstalk We are going to need some environment variables passed down to the build pipeline to be able to deploy to different environments, like staging or prod. We gathered these below: - COMMIT_ID: #{SourceVariables.CommitId} - This will have the commit id from the checkout step. We include this, so we can always check what commit is deployed. - ELASTIC_BEANSTALK_APPLICATION_NAME: Test AWS App - This is the Elastic Beanstalk app which has your environment associated. - ELASTIC_BEANSTALK_ENVIRONMENT_NAME: TestAWSApp-prod - This is the Elastic Beanstalk environment you want to deploy to - API_VERSION_BUCKET: elasticbeanstalk-us-east-1-474671518642 - This is the S3 bucket that was created by Elastic Beanstalk With the above variables, we can make some new variables during the build time, so we can make sure that every API version is unique and gets deployed. We set this up in the install phase. ` The APP_VERSION variable is the version property from the package.json file. In a release process, the application's version is stored here. The API_VERSION variable will contain the APP_VERSION and as a suffix, we include the build number. We want to upload this API version by indicating the commit ID, so the API_ZIP_KEY will have this information. The APP_VERSION_DESCRIPTION will be the description of the deployed version in Elastic Beanstalk. Finally, we are going to update the buildspec file with the actual Elastic Beanstalk deployment steps. ` Let's make a change in the API, for example, the message sent back by the /api/hello endpoint and push up the changes. --- Now every time a change is merged to the main branch, it gets pushed to our production deployment. Using these guides, you can set up multiple environments, and you can configure separate CodePipeline instances to deploy from different branches. I hope this guide proved to be helpful to you....

Creating Custom GitHub Actions cover image

Creating Custom GitHub Actions

Since its generally available release in Nov 2019, Github Actions has seen an incredible increase in adoptions. Github Actions allows you to automate, customize and execute your software development workflows. In this article, we will learn how to create our first custom Github Actions using Typescript. We will also show some of the best practices, suggested by Github, for publishing and versioning our actions. Types of Actions There two types of publishable actions: Javascript and Docker actions. Docker containers provide a more consistent and reliable work unit than Javascript actions because they package the environment with the Github Actions code. They are ideal for actions that must run in a specific configuration. On the other hand, JavaScript actions are faster than Docker actions since they run directly on a runner machine and do not have to worry about building the Docker image every time. Javascript actions can run in Windows, Mac, and Linux, while Docker actions can just run in Linux. But most importantly (for the purpose of this article), Javascript actions are easier to write. There is a third kind of Action: the Composite run steps Actions. These help you reuse code inside your project workflows, and hide complexity when you do not want to publish the Action to the marketplace. You can quickly learn how to create Composite run step Actions in this video, or by reading through the docs. The Action For this article, we will be creating a simple Javascript Action. We will use the Typescript Action template to simplify our setup, and use TypeScript out of the box. The objective is to walk over the whole lifecycle of creating and publishing a custom GitHub Action. We will be creating a simple action that counts the Lines of Code (LOC) of a given type of file, and throws if the sum of LOC exceeds a given threshold. > Keep in mind that the source code is not production-ready and should only be used for learning. The Action will receive three params: - fileOrFolderToProcess (optional): The file or folder to process - filesAndFoldersToIgnore (optional): A list of directories to ignore. It supports glob patterns. - maxCount (required): The maximum number of LOC for the sum of files. The Action recursively iterates over all files under the folder to calculate the total amount of Lines of Code for our project. During the process, the Actions will skip the files and folders marked to ignore, and at the end, if the max count is reached, we throw an error. Additionally, we will set the total LOC in an Action output no matter the result of the Action. Setting up the Environment JavaScript Github Actions are not significantly different from any other Javascript project. We will set up some minimal configuration, but you should feel free to add your favorite workflow. Let us start by creating the repository. As mentioned, we will use the Typescript Github Actions template, which will provide some basic configuration for us. We start by visiting https://github.com/actions/typescript-action. We should see something like this: The first thing we need to do is add a start to the repo :). Once that is completed, we will then click on the "Use this template" button. We are now in a regular "create new repository" page that we must fill. We can then create our new repository by clicking the "Create repository from template" button. Excellent, now our repository is created. Let us take a look at what this template has provided for us. The first thing to notice is that Github recognizes that we are in a GitHub Actions source code. Because of that, GitHub provides a contextual button to start releasing our Action. The file that allows this integration is the action.yml file. That is the action metadata container, including the name, description, inputs, and outputs. It is also where we will reference the entry point .js for our Action. The mentioned entry point will be located in the dist folder, and the files contained there is the result of building our Typescript files. > Important! Github uses the dist folder to run the Actions. Unlike other repositories, this build bundle MUST be included in the repository, and should not be ignored. Our source code lives in the source folder. The main.ts is what would be compiled to our Action entry point index.js. There is where most of our work will be focused. Additional files and configurations In addition to the main files, the TypeScript template also adds configuration files for Jest, TypeScript, Prettier and ESLint. A Readme template and a CODEOWNERS file are included, along with a LICENSE. Lastly, it will also provide us with a GitHub CI YAML file with everything we need to e2e test our Action. Final steps To conclude our setup walkthrough, let us clone the repository. I will be using mine, but you should replace the repository with yours. ` Navigate to the cloned project folder, and install the dependencies. ` Now we are ready to start implementing our Action. The implementation First we must configure our action.yml file and define our API. The metadata The first three properties are mostly visual metadata for the Workspace, and the Actions tab. ` The name property is the name of your Action. GitHub displays the name in the Actions tab to help visually identify actions in each job. GitHub will also use the name, the description, and the author of the Action to inform users about the Action goal in the Actions Marketplace. Ensure a short and precise description; Doing so will help the users of the Action quickly identify the problem that the Action is solving. Next, we define our inputs. Like we did with the Action, we should write a short and precise description to avoid confusion about the usage of each input variable. ` We will mark our inputs as required or optional, according to what we already specified when describing our plans for the Action. The default values help provide pre-configured data to our Action. As with the inputs, we must define the outputs. ` Actions that run later in a workflow can use the output data set in our Action run. If you don't declare an output in your action metadata file, you can still set outputs and use them in a workflow. However, it would not be evident for a user searching for the Action in the Marketplace since GitHub cannot detect outputs that are not defined in the metadata file. Finally, we define the application running the Action and the entry point for the Action itself. ` Now, let's see everything together so we can appreciate the big picture of our Action metadata. ` The Code Now that we have defined all our metadata and made GitHub happy, we can start coding our Action. Our code entry point is located at src/maint.ts. Let's open the file in our favorite IDE and start coding. Let's clean all the unnecessary code that the template created for us. We will, however, keep the core tools import. ` The core library will give us all the tools we need to interact with the inputs and outputs, force the step to fail, add debugging information, and much more. Discover all the tools provided by the Github Actions Toolkit. After cleaning up all of the example code, the initial step would be extracting and transforming our inputs to a proper form. ` With our inputs ready, we need to start thinking about counting our LOC while enforcing the input restrictions. Luckily there is a couple of libraries that can do this for us. For this example, we will be using node-sloc, but feel free to use any other. Go on and install the dependency using npm or any package manager that you prefer. ` Import the library. ` And the rest of the implementation is straightforward. ` Great! We have our LOC information ready. Let's use it to set the output defined in the metadata before doing anything else. ` Additionally, we will also provide debuggable data. Notice that debug information is only available if the repository owner activated debug logging capabilities. ` Here is the link if you are interested in debugging the Action yourself. Finally, verify that the count of the LOC is not exceeding the threshold. ` If the threshold is exceeded, we use the core.setFailed, to make this action step fail and, therefore, the entire pipeline fails. ` Excellent! We just finished our Action. Now we have to make it available for everyone. But first, lets configure our CI to perform an e2e test of our Action. Go to the file .github/workflows/*.yml. I called mine ci.yml but you can use whatever name makes sense to you. ` Here, we are triggering the pipeline whenever a pull request is created with base branch main or the main branch itself is pushed. Then, we run the base setup steps, like installing the packages, and building the action to verify that everything works as it should. Finally, we run e2e jobs that will test the actions as we were running it in an external application. That's it! Now we can publish our Action with confidence. Publish and versioning Something you must not forget before any release is to build and package your Action. ` These commands will compile your TypeScript and JavaScript into a single file bundle on the dist folder. With that ready, we can commit our changes to the main branch, and push to origin. Go back to your browser and navigate to the Action repository. First, go to the Actions tab and verify that our pipeline is green and the Action is working as expected. After that check, go back to the "Code" tab, the home route of our repository. Remember the "Draft a release" button? Well, it is time to click it. We are now on the releases page. This is where our first release will be created. Click on the terms and conditions link, and agree with the terms to publish your actions. Check the "Publish this Action to the Github Marketplace" input, and fill in the rest of the information. You can mark this as pre-release if you want to experiment with the Action before inviting users to use it. And that's it! Just click the "Publish release" button. Tada! Click in the marketplace button to see how your Action looks! After the first release is out, you will probably start adding features or fixing bugs. There are some best practices that you should follow while maintaining your versioning. Use this guide to keep your version under control. But the main idea is that the major tag- v1 for instance- should always be referencing the latest tag with the same major version. This means that if we release v1.9.3 we should update v1 to the same commit as v1.9.3. Our Action is ready. The obvious next step is to test it with a real application. Using the Action Now it is time to test our Action, and see how it works in the wild. We are going to use our Plugin Architecture example application. If you have read that article yet, here is the link. The first thing we need to do is create a new git branch. After that, we create our ci.yml file under .github/workflows. And we add the following pipeline code. ` Basically, we are just triggering this Action when a PR is created using main as the base branch, or if we push directly to main. Then, we add a single job that will checkout the PR branch and use our Action with a max count of 200. Finally, we print the value of our output variable. Save, commit, and push. Create your PR, go to the check tab, and see the result of your effort. Great! We have our first failing custom GitHub action. Now, 200 is a bit strict. Maybe 1000 lines of code are more appropriate. Adjust your step, commit, and push to see your pipeline green and passing. How great is that!? Conclusion Writing Custom GitHub Actions using JavaScript and TypeScript is really easy, but it can seem challenging when we are not familiar with the basics. We covered an end-to-end tutorial about creating, implementing, publishing, and testing your Custom GitHub Action. This is really just the beginning. There are unlimited possibilities to what you can create using GitHub Actions. Use what you learned today to make the community a better place with the tools you can create for everyone....

Internationalization in Next.js with next-intl cover image

Internationalization in Next.js with next-intl

Internationalization in Next.js with next-intl Internationalization (i18n) is essential for providing a multi-language experience for global applications. next-intl integrates well with Next.jsโ€™ App Router, handling i18n routing, locale detection, and dynamic configuration. This guide will walk you through setting up i18n in Next.js using next-intl for URL-based routing, user-specific settings, and domain-based locale routing. Getting Started First, create a Next.js app with the App Router and install next-intl: ` Next, configure next-intl in the next.config.ts file to provide a request-specific i18n configuration for Server Components: ` Without i18n Routing Setting up an app without i18n routing integration can be advantageous in scenarios where you want to provide a locale to next-intl based on user-specific settings or when your app supports only a single language. This approach offers the simplest way to begin using next-intl, as it requires no changes to your appโ€™s structure, making it an ideal choice for straightforward implementations. ` Hereโ€™s a quick explanation of each file's role: * translations/: Stores different translations per language (e.g., en.json for English, es.json for Spanish). Organize this as needed, e.g., translations/en/common.json. * request.ts: Manages locale-based configuration scoped to each request. Setup request.ts for Request-Specific Configuration Since we will be using features from next-intl in Server Components, we need to add the following configuration in i18n/request.ts: ` Here, we define a static locale and use that to determine which translation file to import. The imported JSON data is stored in the message variable, and is returned together with the locale so that we can access them from various components in the application. Using Translation in RootLayout Inside RootLayout, we use getLocale() to retrieve the static locale and set the document language for SEO and pass translations to NextIntlClientProvider: ` Note that NextIntlClientProvider automatically inherits configuration from i18n/request.ts here, but messages must be explicitly passed. Now you can use translations and other functionality from next-intl in your components: ` In case of async components, you can use the awaitable getTranslations function instead: ` And with that, you have i18n configured and working on your application! \ Now, letโ€™s take it a step further by introducing routing. \ With i18n Routing To set up i18n routing, we need a file structure that separates each language configuration and translation file. Below is the recommended structure: ` We updated the earlier structure to include some files that we require for routing: * routing.ts: Sets up locales, default language, and routing, shared between middleware and navigation. * middleware.ts: Handles URL rewrites and locale negotiation. * app/[locale]/: Creates dynamic routes for each locale like /en/about and /es/about. Define Routing Configuration in i18n/routing.ts The routing.ts file configures supported locales and the default locale, which is referenced by middleware.ts and other navigation functions: ` This configuration lets Next.js handle URL paths like /about, with locale management managed by next-intl. Update request.ts for Request-Specific Configuration We need to update the getRequestConfig function from the above implementation in i18n/request.ts. ` Here, request.ts ensures that each request loads the correct translation files based on the userโ€™s locale or falls back to the default. Setup Middleware for Locale Matching The middleware.ts file matches the locale based on the request: ` Middleware handles locale matches and redirects to localized paths like /en or /es. Updating the RootLayout file Inside RootLayout, we use the locale from params (matched by middleware) instead of calling getLocale() ` The locale we get from the params was matched in the middleware.ts file and we use that here to set the document language for SEO purposes. Additionally, we used this file to pass configuration from i18n/request.ts to Client Components through NextIntlClientProvider. Note: When using the above setup with i18n routing, next-intl will currently opt into dynamic rendering when APIs like useTranslations are used in Server Components. next-intl provides a temporary API that can be used to enable static rendering. Static Rendering for i18n Routes For apps with dynamic routes, use generateStaticParams to pass all possible locale values, allowing Next.js to render at build time: ` next-intl provides an API setRequestLocale that can be used to distribute the locale that is received via params in layouts and pages for usage in all Server Components that are rendered as part of the request. You need to call this function in every layout/page that you intend to enable static rendering for since Next.js can render layouts and pages independently. ` Note: Call setRequestLocale before invoking useTranslations or getMessages or any next-intl functions. Domain Routing For domain-specific locale support, use the domains setting to map domains to locales, such as us.example.com/en or ca.example.com/fr. ` This setup allows you to serve localized content based on domains. Read more on domain routing here. Conclusion Setting up internationalization in Next.js with next-intl provides a modular way to handle URL-based routing, user-defined locales, and domain-specific configurations. Whether you need URL-based routing or a straightforward single-locale setup, next-intl adapts to fit diverse i18n needs. With these tools, your app will be ready to deliver a seamless multi-language experience to users worldwide....

Let's innovate together!

We're ready to be your trusted technical partners in your digital innovation journey.

Whether it's modernization or custom software solutions, our team of experts can guide you through best practices and how to build scalable, performant software that lasts.

Prefer email? hi@thisdot.co