General

Are you an AI Engineer? What is RAG? AI Implemented with Tracy Lee and Rob Ocel

Published May 30, 2024

Updated Dec 11, 2024

1 min read

In this episode of the Modern Web podcast, Tracy Lee and Rob Ocel discuss how AI can revolutionize processes and enhance efficiency, highlighting Tracy's exploration of RAG as an example. RAG is retrieval augmented generation, making it easy to implement AI by connecting to databases and leveraging large datasets. This opens up exciting possibilities for businesses to automate tasks, generate personalized content, and provide enhanced customer experiences.

Implementing AI comes with challenges, and Tracy and Rob openly share their experiences and insights. They offer tips for effectively using AI tools and emphasize understanding the limitations and biases that can arise. The hosts also discuss the controversial GitHub Co-Pilot, an AI-powered coding assistant, and the ethical considerations surrounding its use.

Beyond AI, Tracy and Rob highlight the importance of networking in the tech industry. They share their experiences at tech conferences like City JS, Cascadia JS, and Render, underscoring the value of creating meaningful connections with like-minded professionals. Attending conferences provides opportunities to learn from industry experts and opens doors for collaborations, mentorships, and career growth.

Download this episode.

This Dot is a consultancy dedicated to guiding companies through their modernization and digital transformation journeys. Specializing in replatforming, modernizing, and launching new initiatives, we stand out by taking true ownership of your engineering projects.

We love helping teams with projects that have missed their deadlines or helping keep your strategic digital initiatives on course. Check out our case studies and our clients that trust us with their engineering.

Dismantling Your AI Bias with Jerome Hardaway and Tracy Lee

In this inaugural episode in a series on the six steps to AI adoption, Tracy Lee and Jerome Hardaway explore the impact of AI on various industries, emphasizing the need to address bias and adapt as developers. The first step is dismantling your own bias against AI. They advocate for treating AI as a tool to enhance human capabilities, and how it can revolutionize education and streamline workflows by augmenting our everyday tasks. Even better, the transformative potential of AI creates new job opportunities, necessitating education and upskilling initiatives to prepare individuals for the changing job market. By addressing bias, embracing continuous learning, and recognizing AI's capacity to augment human capabilities, we can unlock its full potential in shaping a better future. Download this podcast episode here....

Mar 12, 2024

1 min

Artificial Intelligence

How to build an AI assistant with OpenAI, Vercel AI SDK, and Ollama with Next.js

In today’s blog post, we’ll build an AI Assistant using three different AI models: Whisper and TTS from OpenAI and Llama 3.1 from Meta. While exploring AI, I wanted to try different things and create an AI assistant that works by voice. This curiosity led me to combine OpenAI’s Whisper and TTS models with Meta’s Llama 3.1 to build a voice-activated assistant. Here’s how these models will work together: * First, we’ll send our audio to the Whisper model, which will convert it from speech to text. * Next, we’ll pass that text to the Llama 3.1 model. Llama will understand the text and generate a response. * Finally, we’ll take Llama’s response and send it to the TTS model, turning the text back into speech. We’ll then stream that audio back to the client. Let’s dive in and start building this excellent AI Assistant! Getting started We will use different tools to build our assistant. To build our client side, we will use Next.js. However, you could choose whichever framework you prefer. To use our OpenAI models, we will use their TypeScript / JavaScript SDK. To use this API, we require the following environmental variable: OPENAI_API_KEY— To get this key, we need to log in to the OpenAI dashboard and find the API keys section. Here, we can generate a new key. Awesome. Now, to use our Llama 3.1 model, we will use Ollama and the Vercel AI SDK, utilizing a provider called ollama-ai-provider. Ollama will allow us to download our preferred model (we could even use a different one, like Phi) and run it locally. The Vercel SDK will facilitate its use in our Next.js project. To use Ollama, we just need to download it and choose our preferred model. For this blog post, we are going to select Llama 3.1. After installing Ollama, we can verify if it is working by opening our terminal and writing the following command: Notice that I wrote “llama3.1” because that’s my chosen model, but you should use the one you downloaded. Kicking things off It's time to kick things off by setting up our Next.js app. Let's start with this command: ` After running the command, you’ll see a few prompts to set the app's details. Let's go step by step: * Name your app. * Enable app router. The other steps are optional and entirely up to you. In my case, I also chose to use TypeScript and Tailwind CSS. Now that’s done, let’s go into our project and install the dependencies that we need to run our models: ` Building our client logic Now, our goal is to record our voice, send it to the backend, and then receive a voice response from it. To record our audio, we need to use client-side functions, which means we need to use client components. In our case, we don’t want to transform our whole page to use client capabilities and have the whole tree in the client bundle; instead, we would prefer to use Server components and import our client components to progressively enhance our application. So, let’s create a separate component that will handle the client-side logic. Inside our app folder, let's create a components folder, and here, we will be creating our component: ` Let’s go ahead and initialize our component. I went ahead and added a button with some styles in it: ` And then import it into our Page Server component: ` Now, if we run our app, we should see the following: Awesome! Now, our button doesn’t do anything, but our goal is to record our audio and send it to someplace; for that, let us create a hook that will contain our logic: ` We will use two APIs to record our voice: navigator and MediaRecorder. The navigator API will give us information about the user’s media devices like the user media audio, and the MediaRecorder will help us record the audio from it. This is how they’re going to play out together: ` Let’s explain this code step by step. First, we create two new states. The first one is for keeping track of when we are recording, and the second one stores the instance of our MediaRecorder. ` Then, we’ll create our first method, startRecording. Here, we are going to have the logic to start recording our audio. We first check if the user has media devices available thanks to the navigator API that gives us information about the browser environment of our user: If we don’t have media devices to record our audio, we just return. If they do, then let us create a stream using their audio media device. ` Finally, we go ahead and create an instance of a MediaRecorder to record this audio: ` Then we need a method to stop our recording, which will be our stopRecording. Here, we will just stop our recording in case a media recorder exists. ` We are recording our audio, but we are not storing it anywhere. Let’s add a new useEffect and ref to accomplish this. We would need a new ref, and this is where our chunks of audio data will be stored. ` In our useEffect we are going to do two main things: store those chunks in our ref, and when it stops, we are going to create a new Blob of type audio/mp3: ` It is time to wire this hook with our AudioRecorder component: ` Let’s go to the other side of the coin, the backend! Setting up our Server side We want to use our models on the server to keep things safe and run faster. Let’s create a new route and add a handler for it using route handlers from Next.js. In our App folder, let’s make an “Api” folder with the following route in it: We want to use our models on the server to keep things safe and run faster. Let’s create a new route and add a handler for it using route handlers from Next.js. In our App folder, let’s make an “Api” folder with the following route in it: ` Our route is called ‘chat’. In the route.ts file, we’ll set up our handler. Let’s start by setting up our OpenAI SDK. ` In this route, we’ll send the audio from the front end as a base64 string. Then, we’ll receive it and turn it into a Buffer object. ` It’s time to use our first model. We want to turn this audio into text and use OpenAI’s Whisper Speech-To-Text model. Whisper needs an audio file to create the text. Since we have a Buffer instead of a file, we’ll use their ‘toFile’ method to convert our audio Buffer into an audio file like this: ` Notice that we specified “mp3”. This is one of the many extensions that the Whisper model can use. You can see the full list of supported extensions here: https://platform.openai.com/docs/api-reference/audio/createTranscription#audio-createtranscription-file Now that our file is ready, let’s pass it to Whisper! Using our OpenAI instance, this is how we will invoke our model: ` That’s it! Now, we can move on to the next step: using Llama 3.1 to interpret this text and give us an answer. We’ll use two methods for this. First, we’ll use ‘ollama’ from the ‘ollama-ai-provider’ package, which lets us use this model with our locally running Ollama. Then, we’ll use ‘generateText’ from the Vercel AI SDK to generate the text. Side note: To make our Ollama run locally, we need to write the following command in the terminal: ` ` Finally, we have our last model: TTS from OpenAI. We want to reply to our user with audio, so this model will be really helpful. It will turn our text into speech: ` The TTS model will turn our response into an audio file. We want to stream this audio back to the user like this: ` And that’s all the whole backend code! Now, back to the frontend to finish wiring everything up. Putting It All Together In our useRecordVoice.tsx hook, let's create a new method that will call our API endpoint. This method will also take the response back and play to the user the audio that we are streaming from the backend. ` Great! Now that we’re getting our streamed response, we need to handle it and play the audio back to the user. We’ll use the AudioContext API for this. This API allows us to store the audio, decode it and play it to the user once it’s ready: ` And that's it! Now the user should hear the audio response on their device. To wrap things up, let's make our app a bit nicer by adding a little loading indicator: ` Conclusion In this blog post, we saw how combining multiple AI models can help us achieve our goals. We learned to run AI models like Llama 3.1 locally and use them in our Next.js app. We also discovered how to send audio to these models and stream back a response, playing the audio back to the user. This is just one of many ways you can use AI—the possibilities are endless. AI models are amazing tools that let us create things that were once hard to achieve with such quality. Thanks for reading; now, it’s your turn to build something amazing with AI! You can find the complete demo on GitHub: AI Assistant with Whisper TTS and Ollama using Next.js...

Sep 27, 2024

8 mins

AINextJSJavaScript

The 2025 Guide to JS Build Tools

The 2025 Guide to JS Build Tools In 2025, we're seeing the largest number of JavaScript build tools being actively maintained and used in history. Over the past few years, we've seen the trend of many build tools being rewritten or forked to use a faster and more efficient language like Rust and Go. In the last year, new companies have emerged, even with venture capital funding, with the goal of working on specific sets of build tools. Void Zero is one such recent example. With so many build tools around, it can be difficult to get your head around and understand which one is for what. Hopefully, with this blog post, things will become a bit clearer. But first, let's explain some concepts. Concepts When it comes to build tools, there is no one-size-fits-all solution. Each tool typically focuses on one or two primary features, and often relies on other tools as dependencies to accomplish more. While it might be difficult to explain here all of the possible functionalities a build tool might have, we've attempted to explain some of the most common ones so that you can easily understand how tools compare. Minification The concept of minification has been in the JavaScript ecosystem for a long time, and not without reason. JavaScript is typically delivered from the server to the user's browser through a network whose speed can vary. Thus, there was a need very early in the web development era to compress the source code as much as possible while still making it executable by the browser. This is done through the process of *minification*, which removes unnecessary whitespace, comments, and uses shorter variable names, reducing the total size of the file. This is what an unminified JavaScript looks like: ` This is the same file, minified: ` Closely related to minimizing is the concept of source maps#Source_mapping), which goes hand in hand with minimizing - source maps are essentially mappings between the minified file and the original source code. Why is that needed? Well, primarily for debugging minified code. Without source maps, understanding errors in minified code is nearly impossible because variable names are shortened, and all formatting is removed. With source maps, browser developer tools can help you debug minified code. Tree-Shaking *Tree-shaking* was the next-level upgrade from minification that became possible when ES modules were introduced into the JavaScript language. While a minified file is smaller than the original source code, it can still get quite large for larger apps, especially if it contains parts that are effectively not used. Tree shaking helps eliminate this by performing a static analysis of all your code, building a dependency graph of the modules and how they relate to each other, which allows the bundler to determine which exports are used and which are not. Once unused exports are found, the build tool will remove them entirely. This is also called *dead code elimination*. Bundling Development in JavaScript and TypeScript rarely involves a single file. Typically, we're talking about tens or hundreds of files, each containing a specific part of the application. If we were to deliver all those files to the browser, we would overwhelm both the browser and the network with many small requests. *Bundling* is the process of combining multiple JS/TS files (and often other assets like CSS, images, etc.) into one or more larger files. A bundler will typically start with an entry file and then recursively include every module or file that the entry file depends on, before outputting one or more files containing all the necessary code to deliver to the browser. As you might expect, a bundler will typically also involve minification and tree-shaking, as explained previously, in the process to deliver only the minimum amount of code necessary for the app to function. Transpiling Once TypeScript arrived on the scene, it became necessary to translate it to JavaScript, as browsers did not natively understand TypeScript. Generally speaking, the purpose of a *transpiler* is to transform one language into another. In the JavaScript ecosystem, it's most often used to transpile TypeScript code to JavaScript, optionally targeting a specific version of JavaScript that's supported by older browsers. However, it can also be used to transpile newer JavaScript to older versions. For example, arrow functions, which are specified in ES6, are converted into regular function declarations if the target language is ES5. Additionally, a transpiler can also be used by modern frameworks such as React to transpile JSX syntax (used in React) into plain JavaScript. Typically, with transpilers, the goal is to maintain similar abstractions in the target code. For example, transpiling TypeScript into JavaScript might preserve constructs like loops, conditionals, or function declarations that look natural in both languages. Compiling While a transpiler's purpose is to transform from one language to another without or with little optimization, the purpose of a *compiler* is to perform more extensive transformations and optimizations, or translate code from a high-level programming language into a lower-level one such as bytecode. The focus here is on optimizing for performance or resource efficiency. Unlike transpiling, compiling will often transform abstractions so that they suit the low-level representation, which can then run faster. Hot-Module Reloading (HMR) *Hot-module reloading* (HMR) is an important feature of modern build tools that drastically improves the developer experience while developing apps. In the early days of the web, whenever you'd make a change in your source code, you would need to hit that refresh button on the browser to see the change. This would become quite tedious over time, especially because with a full-page reload, you lose all the application state, such as the state of form inputs or other UI components. With HMR, we can update modules in real-time without requiring a full-page reload, speeding up the feedback loop for any changes made by developers. Not only that, but the full application state is typically preserved, making it easier to test and iterate on code. Development Server When developing web applications, you need to have a locally running development server set up on something like http://localhost:3000. A development server typically serves unminified code to the browser, allowing you to easily debug your application. Additionally, a development server will typically have hot module replacement (HMR) so that you can see the results on the browser as you are developing your application. The Tools Now that you understand the most important features of build tools, let's take a closer look at some of the popular tools available. This is by no means a complete list, as there have been many build tools in the past that were effective and popular at the time. However, here we will focus on those used by the current popular frameworks. In the table below, you can see an overview of all the tools we'll cover, along with the features they primarily focus on and those they support secondarily or through plugins. The tools are presented in alphabetical order below. Babel Babel, which celebrated its 10th anniversary since its initial release last year, is primarily a JavaScript transpiler used to convert modern JavaScript (ES6+) into backward-compatible JavaScript code that can run on older JavaScript engines. Traditionally, developers have used it to take advantage of the newer features of the JavaScript language without worrying about whether their code would run on older browsers. esbuild esbuild, created by Evan Wallace, the co-founder and former CTO of Figma, is primarily a bundler that advertises itself as being one of the fastest bundlers in the market. Unlike all the other tools on this list, esbuild is written in Go. When it was first released, it was unusual for a JavaScript bundler to be written in a language other than JavaScript. However, this choice has provided significant performance benefits. esbuild supports ESM and CommonJS modules, as well as CSS, TypeScript, and JSX. Unlike traditional bundlers, esbuild creates a separate bundle for each entry point file. Nowadays, it is used by tools like Vite and frameworks such as Angular. Metro Unlike other build tools mentioned here, which are mostly web-focused, Metro's primary focus is React Native. It has been specifically optimized for bundling, transforming, and serving JavaScript and assets for React Native apps. Internally, it utilizes Babel as part of its transformation process. Metro is sponsored by Meta and actively maintained by the Meta team. Oxc The JavaScript Oxidation Compiler, or Oxc, is a collection of Rust-based tools. Although it is referred to as a compiler, it is essentially a toolchain that includes a parser, linter, formatter, transpiler, minifier, and resolver. Oxc is sponsored by Void Zero and is set to become the backbone of other Void Zero tools, like Vite. Parcel Feature-wise, Parcel covers a lot of ground (no pun intended). Largely created by Devon Govett, it is designed as a zero-configuration build tool that supports bundling, minification, tree-shaking, transpiling, compiling, HMR, and a development server. It can utilize all the necessary types of assets you will need, from JavaScript to HTML, CSS, and images. The core part of it is mostly written in JavaScript, with a CSS transformer written in Rust, whereas it delegates the JavaScript compilation to a SWC. Likewise, it also has a large collection of community-maintained plugins. Overall, it is a good tool for quick development without requiring extensive configuration. Rolldown Rolldown is the future bundler for Vite, written in Rust and built on top of Oxc, currently leveraging its parser and resolver. Inspired by Rollup (hence the name), it will provide Rollup-compatible APIs and plugin interface, but it will be more similar to esbuild in scope. Currently, it is still in heavy development and it is not ready for production, but we should definitely be hearing more about this bundler in 2025 and beyond. Rollup Rollup is the current bundler for Vite. Originally created by Rich Harris, the creator of Svelte, Rollup is slowly becoming a veteran (speaking in JavaScript years) compared to other build tools here. When it originally launched, it introduced novel ideas focused on ES modules and tree-shaking, at the time when Webpack as its competitor was becoming too complex due to its extensive feature set - Rollup promised a simpler way with a straightforward configuration process that is easy to understand. Rolldown, mentioned previously, is hoped to become a replacement for Rollup at some point. Rsbuild Rsbuild is a high-performance build tool written in Rust and built on top of Rspack. Feature-wise, it has many similiarities with Vite. Both Rsbuild and Rspack are sponsored by the Web Infrastructure Team at ByteDance, which is a division of ByteDance, the parent company of TikTok. Rsbuild is built as a high-level tool on top of Rspack that has many additional features that Rspack itself doesn't provide, such as a better development server, image compression, and type checking. Rspack Rspack, as the name suggests, is a Rust-based alternative to Webpack. It offers a Webpack-compatible API, which is helpful if you are familiar with setting up Webpack configurations. However, if you are not, it might have a steep learning curve. To address this, the same team that built Rspack also developed Rsbuild, which helps you achieve a lot with out-of-the-box configuration. Under the hood, Rspack uses SWC for compiling and transpiling. Feature-wise, it’s quite robust. It includes built-in support for TypeScript, JSX, Sass, Less, CSS modules, Wasm, and more, as well as features like module federation, PostCSS, Lightning CSS, and others. Snowpack Snowpack was created around the same time as Vite, with both aiming to address similar needs in modern web development. Their primary focus was on faster build times and leveraging ES modules. Both Snowpack and Vite introduced a novel idea at the time: instead of bundling files while running a local development server, like traditional bundlers, they served the app unbundled. Each file was built only once and then cached indefinitely. When a file changed, only that specific file was rebuilt. For production builds, Snowpack relied on external bundlers such as Webpack, Rollup, or esbuild. Unfortunately, Snowpack is a tool you’re likely to hear less and less about in the future. It is no longer actively developed, and Vite has become the recommended alternative. SWC SWC, which stands for Speedy Web Compiler, can be used for both compilation and bundling (with the help of SWCpack), although compilation is its primary feature. And it really is speedy, thanks to being written in Rust, as are many other tools on this list. Primarily advertised as an alternative to Babel, its SWC is roughly 20x faster than Babel on a single thread. SWC compiles TypeScript to JavaScript, JSX to JavaScript, and more. It is used by tools such as Parcel and Rspack and by frameworks such as Next.js, which are used for transpiling and minification. SWCpack is the bundling part of SWC. However, active development within the SWC ecosystem is not currently a priority. The main author of SWC now works for Turbopack by Vercel, and the documentation states that SWCpack is presently not in active development. Terser Terser has the smallest scope compared to other tools from this list, but considering that it's used in many of those tools, it's worth separating it into its own section. Terser's primary role is minification. It is the successor to the older UglifyJS, but with better performance and ES6+ support. Vite Vite is a somewhat of a special beast. It's primarily a development server, but calling it just that would be an understatement, as it combines the features of a fast development server with modern build capabilities. Vite shines in different ways depending on how it's used. During development, it provides a fast server that doesn't bundle code like traditional bundlers (e.g., Webpack). Instead, it uses native ES modules, serving them directly to the browser. Since the code isn't bundled, Vite also delivers fast HMR, so any updates you make are nearly instant. Vite uses two bundlers under the hood. During development, it uses esbuild, which also allows it to act as a TypeScript transpiler. For each file you work on, it creates a file for the browser, allowing an easy separation between files which helps HMR. For production, it uses Rollup, which generates a single file for the browser. However, Rollup is not as fast as esbuild, so production builds can be a bit slower than you might expect. (This is why Rollup is being rewritten in Rust as Rolldown. Once complete, you'll have the same bundler for both development and production.) Traditionally, Vite has been used for client-side apps, but with the new Environment API released in Vite 6.0, it bridges the gap between client-side and server-rendered apps. Turbopack Turbopack is a bundler, written in Rust by the creators of webpack and Next.js at Vercel. The idea behind Turbopack was to do a complete rewrite of Webpack from scratch and try to keep a Webpack compatible API as much as possible. This is not an easy feat, and this task is still not over. The enormous popularity of Next.js is also helping Turbopack gain traction in the developer community. Right now, Turbopack is being used as an opt-in feature in Next.js's dev server. Production builds are not yet supported but are planned for future releases. Webpack And finally we arrive at Webpack, the legend among bundlers which has had a dominant position as the primary bundler for a long time. Despite the fact that there are so many alternatives to Webpack now (as we've seen in this blog post), it is still widely used, and some modern frameworks such as Next.js still have it as a default bundler. Initially released back in 2012, its development is still going strong. Its primary features are bundling, code splitting, and HMR, but other features are available as well thanks to its popular plugin system. Configuring Webpack has traditionally been challenging, and since it's written in JavaScript rather than a lower-level language like Rust, its performance lags behind compared to newer tools. As a result, many developers are gradually moving away from it. Conclusion With so many build tools in today's JavaScript ecosystem, many of which are similarly named, it's easy to get lost. Hopefully, this blog post was a useful overview of the tools that are most likely to continue being relevant in 2025. Although, with the speed of development, it may as well be that we will be seeing a completely different picture in 2026!...

Feb 14, 2025

12 mins

JavaScriptDevTools

The Importance of a Scientific Mindset in Software Engineering: Part 1 (Source Evaluation & Literature Review)

The Importance of a Scientific Mindset in Software Engineering: Part 1 (Source Evaluation & Literature Review) Today, I will write about something very dear to me - science. But not about science as a field of study but rather as a way of thinking. It's easy nowadays to get lost in the sea of information, fall for marketing hype, or even be trolled by a hallucinating LLM. A scientific mindset can be a powerful tool for navigating the complex modern world and the world of software engineering in particular. Not only is it a powerful tool, but I'll argue that it's a must nowadays if you want to make informed decisions, solve problems effectively, and become a better engineer. As software engineers, we are constantly confronted with an overwhelming array of frameworks, technologies, and infrastructure choices. Sometimes, it feels like there's a new tool or platform every day, each accompanied by its own wave of hype and marketing. It's easy to feel lost in the myriad of information or even suffer from FOMO and insecurity about not jumping on the latest bandwagon. But it's not only about the abundance of information and making technological decisions. As engineers, we often write documentation, blog posts, talks, or even books. We need to be able to communicate our ideas clearly and effectively. Furthermore, we have to master the art of debugging code, which is essentially a scientific process where we form hypotheses, test them, and iterate until we find the root cause of the problem. Therefore, here's my hot take: engineering is a science; hence, to deserve an engineer title, one needs to think like a scientist. So, let's _review_ (pun intended) what it means to think like a scientist in the context of software engineering. Systematic Review In science, systematic review is not only an essential means to understand a topic and map the current state of knowledge in the field, but it also has a well-defined methodology. You can't just google whatever supports your hypothesis and call it a day. You must define your research question, choose the databases you will search, set your inclusion and exclusion criteria, systematically search for relevant studies, evaluate their quality, and synthesize the results. Most importantly, you must be transparent about and describe your methodology in detail. The general process of systematic review can be summarized in the following steps: 1. Define your research question(s) 2. Choose databases and other sources to search 3. Define keywords and search terms 4. Define inclusion and exclusion criteria a. Define practical criteria such as publication date, language, etc. b. Define methodological criteria such as study design, sample size, etc. 5. Search for relevant studies 6. Evaluate the quality of the studies 7. Synthesize the results Source: Conducting Research Literature Reviews: From the Internet to Paper by Dr. Fink I'm pretty sure you can see where I'm going with this. There are many use cases in software engineering where a process similar to systematic review can be applied. Whether you're evaluating a new technology, choosing a tech stack for a new project, or researching for a blog post or a conference talk, it's important to be systematic in your approach, transparent about your methodology, and honest about the limitations of your research. Of course, when choosing a tech stack to learn or researching for a blog post, you don't have to be as rigorous as in a scientific study. But a few of these steps will always be worth following. Let's focus on those and see how we can apply them in the context of software engineering. Defining Your Research Question(s) Before you start researching, it's important to define your research questions. What are you trying to find out? What problem are you trying to solve? What are the goals of your research? These questions will help you stay focused and avoid focusing on irrelevant information. > A practical example: If you're evaluating, say, whether to use bundler _A_ or bundler _B_ without a clear research question, you might end up focusing on marketing claims about how bundler _A_ is faster than bundler _B_ or how bundler _B_ is more popular than bundler _A_, even though such aspects may have minimal impact on your project. With a clear research question, you can focus on what really matters for your project, like how well each bundler integrates with your existing tools, how well they handle your specific use case, or how well they are maintained. A research question is not a hypothesis - you don't have to have a clear idea of the answer. It's more about defining the scope of your research and setting clear goals. It can be as simple and general as "What are the pros and cons of using React vs. Angular for a particular project?" but also more specific and focused, like "What are the legal implications of using open-source library _X_ for purpose _Y_ in project _Z_?". You can have multiple research questions, but keeping them focused and relevant to your goals is essential. In my personal opinion, part of the scientific mindset is automatically having at least a vague idea of a research question in your head whenever you're facing a problem or a decision, and that alone can make you a more confident and effective engineer. Choosing Databases and Other Sources to Search In engineering, some information (especially when researching rare bugs) can be scarce, and you have to search wherever and take what you can get. Hence, this step is arguably much easier in science, where you can include well-established databases and publications in your search. Information in science is simply more standardized and easier to find. There are, however, still some decisions to be made about where to search. Do you want to include community websites like StackOverflow or Reddit? Do you want to include marketing materials from the companies behind the technologies you're evaluating? These can all be valid sources of information, but they have their limitations and biases, and it's important to be aware of them. Or do you want to ask a LLM? I hadn't included LLMs in the list of valid sources of information on purpose as they are not literature databases in the traditional sense, and I wouldn't consider them a search source for literature research. And for a very good reason - they are essentially a black box, and therefore, you cannot reliably describe a reproducible methodology of your search. That doesn't mean you shouldn't ask an LLM for inspiration or a TL;DR, but you should always verify the information you get from them and be aware of their limitations. Defining Keywords and Search Terms This section will be short, as most of you are familiar with the concept of keywords and search terms and how to use search engines. However, I still wanted to highlight the importance of knowing how to search effectively for a software engineer. It's not just about typing in a few keywords and hoping for the best. It's about learning how to use advanced search operators, filter out irrelevant results, and find the information you need quickly and efficiently. If you're not familiar with advanced search operators, I highly recommend you take some time to learn them, for example, at FreeCodeCamp. Please note, however, that the article is specific to Google and different search engines may have different operators and syntax. This is especially true for scientific databases, which often have their own search syntax and operators. So, if you're doing more formal research, familiarize yourself with the database's search syntax. The underlying principles, however, are pretty much the same everywhere; just the syntax and UI might differ. With a solid search strategy in place, the next critical step is to assess the quality of the information we find. Methodological Criteria and Evaluation of Sources This is where things get interesting. In science, evaluating the quality of the studies is a crucial step in the systematic review process. You can't just take the results of a study at face value - you need to critically evaluate its design, the sample size, the methodology, and the conclusions - and you need to be aware of the limitations of the study and the potential biases that may have influenced the results. In science, there is a pretty straightforward yet helpful categorization of sources that my students surprisingly needed help understanding because no one ever explained it to them. So let me lay out and explain the three categories to you now: 1. Primary sources Primary sources represent original research. You can find them in studies, conference papers, etc. In science, this is what you generally want to cite in your own research. However, remember that only some of what you find in an original research paper is a primary source. Only the parts that present the original research are primary sources. For example, the introduction can contain citations to other studies, which are not primary, but secondary sources. While primary sources can sometimes be perceived as hard to read and understand, in many cases, they can actually be easier to reach and understand as the methods and results are usually presented in a condensed form in the abstract, and often you can only skim the introduction and discussion to get a good idea of the study. In software engineering, primary sources can sometimes be papers, but more often, they are original documentation, case studies, or even blog posts that present original research or data. For example, if you're evaluating a new technology, the official documentation, case studies, and blog posts from its developers can be considered primary sources. 2. Secondary sources Secondary sources are typically reviews, meta-analyses, and other sources that summarize, analyze, or reference the primary sources. A good way to identify a source as secondary is to look for citations to other studies. If a claim has a citation, it's likely a secondary source. On the other hand, something is likely wrong if it doesn't have a citation and doesn't seem to present original research. Secondary sources can be very useful for getting an overview of a topic, understanding the current state of knowledge, and finding relevant primary sources. Meta-analyses, in particular, can provide a beneficial point of view on a subject by combining the results of multiple studies and looking for patterns and trends. The downside of secondary sources is that they can introduce information noise, as they are basically introducing another layer of interpretation and analysis. So, it's always a good idea to go back to the primary sources and verify the information you get from secondary sources. Secondary sources in software engineering include blog posts, talks, or articles that summarize, analyze, or reference primary sources. For example, if you're researching a new technology, a blog post that compares different technologies based on their documentation and/or studies made by their authors can be considered a secondary source. 3. Tertiary sources Tertiary sources represent a further level of abstraction. They are typically textbooks, encyclopedias, and other sources that summarize, analyze, or reference secondary sources. They are useful for getting a broad overview of a topic, understanding the basic concepts, and finding relevant secondary sources. One example I see as a tertiary source is Wikipedia, and while you shouldn't ever cite Wikipedia in academic research, it can be a good starting point for getting an overview of a topic and finding relevant primary and secondary sources as you can easily click through the references. > Note: It's fine to reference Wikipedia in a blog post or a talk to give your audience a convenient explanation of a term or concept. I'm even doing it in this post. However, you should always verify that the article is up to date and that the information is correct. The distinction between primary, secondary, and tertiary sources in software engineering is not as clear-cut as in science, but the general idea still applies. When researching a topic, knowing the different types of sources and their limitations is essential. Primary sources are generally the most reliable and should be your go-to when seeking evidence to support your claims. Secondary sources can help get an overview of a topic, but they should be used cautiously, as they can introduce bias and noise. Tertiary sources are good for getting a broad overview of a topic but should not be used as evidence in academic research. Evaluating Sources Now that we have the categories laid out let's talk about evaluating the quality of the sources because, realistically, not all sources are created equal. In science, we have some well-established criteria for evaluating the quality of a source. Some focus on the general credibility of the source, like the reputation of the journal or the author. In contrast, others focus on the quality of the study itself, like the study design, the sample size, and the methodology. First, we usually look at the number of citations and the impact factor of the journal in which the study was published. These numbers can give us an idea of how well the scientific community received the study and how much other researchers have cited it. In software engineering, we don't have the concept of impact factor when it comes to researching a concept or a technology, but we can still look at how many people are sharing the particular piece of information and how well the professional community receives it and how reputable the person sharing the information is. Second, we look at the study design and the methodology. Does the study have a clear research question? Is the study design appropriate for the research question? Are the methods well-described and reproducible? Are the results presented clearly and honestly? Do the data support the conclusions? Arguably, in software engineering, the honest and clear presentation of the method and results can be even more important than in science, given the amounts of money circulating in the industry and the potential for conflicts of interest. Therefore, it's important to understand where the data is coming from, how it was collected, and how it was analyzed. If a company (or their DevRel person) is presenting data that show their product is the best (fastest, most secure...), it's important to be aware of the potential biases and conflicts of interest that may have influenced the results. The ways in which the results can be skewed may include: - Missing, incomplete, or inappropriate methodology. Often, the methodology is not described in enough detail to be reproducible, or the whole experiment is designed in a way that doesn't actually answer the research question properly. For example, the methodology can omit important details, such as the environment in which the experiment was conducted or even the way the data was collected (e.g., to hide selection bias). - Selection bias can be a common issue in software engineering experiments. For example, if someone is comparing two technologies, they might choose a dataset that they expect to perform better with one of the technologies or a metric that they expect to show a difference. Selection bias can lead to skewed results that don't reflect the technologies' real-world performance. - Publication bias is a common issue in science, where studies that show a positive result are more likely to be published than studies that show a negative outcome. In software engineering, this can manifest as a bias towards publishing success stories and case studies, while ignoring failures and negative results. - Confirmation bias is a problem in science and software engineering alike. It's the tendency to look for evidence that confirms your hypothesis and ignore evidence that contradicts it. Confirmation bias can lead to cherry-picking data, misinterpreting results, and drawing incorrect conclusions. - Conflict of interest. While less common in academic research, conflicts of interest can be a big issue in industry research. If a company is funding a study that shows its product in a positive light, it's important to be aware of the potential biases that may have influenced the results. Another thing we look at is the conclusions. Do the data support the conclusions? Are they reasonable and justified? Are they overstated or exaggerated? Are the limitations of the study acknowledged? Are the implications of the study discussed? It all goes back to honesty and transparency, which is crucial for evaluating the quality of the source. Last but not least, we should look at the citations and references included in the source. In the same way we apply the systematic review process to our research, we should also apply it to the sources we use. I would argue that this is even more important in software engineering, where the information is often less standardized, and you come across many unsupported claims. If a source doesn't provide citations or references to back up their claims, it's a red flag that the information may not be reliable. This brings us to something called anecdotal evidence. Anecdotal evidence is a personal story or experience used to support a claim. While anecdotal evidence can be compelling and persuasive, it is generally considered a weak form of evidence, as it is based on personal experience rather than empirical data. So when someone tells you that X is better than Y because they tried it and it worked for them, or that Z is true because they heard it from someone, take it with a massive grain of salt and look for more reliable sources of information. That, of course, doesn't mean you should ask for a source under every post on social media, but it's important to recognize what's a personal opinion and what's a claim based on evidence. Synthesizing the Results Once you have gathered all the relevant information, it's time to synthesize the results. This is where you combine all the evidence you have collected, analyze it, and draw conclusions. In science, this is often done as part of a meta-analysis, where the results of multiple studies are combined and analyzed to look for patterns and trends using statistical methods. A meta-analysis is a powerful tool for synthesizing the results of multiple studies and drawing more robust conclusions than can be drawn from any single study. You might not be doing a formal meta-analysis in software engineering, but you can still apply the same principles to your research. Look for common themes and trends in the information you have gathered, compare and contrast different sources, and draw conclusions based on the evidence. Conclusion Adopting a scientific way of thinking isn't just a nice-to-have in software engineering - it's essential to make informed decisions, solve problems effectively, and navigate the vast amount of information around you with confidence. Applying systematic review principles to your research allows you to gather reliable information, evaluate it critically, and draw sound conclusions based on evidence. Let's summarize what such a systematic research approach can look like: - Define Clear Research Questions: - Start every project or decision-making process by clearly stating what you aim to achieve or understand. - Example: "What factors should influence our choice between Cloud Service A and Cloud Service B for our application's specific needs?" - Critically Evaluate Sources: - Identify the type of sources (primary, secondary, tertiary) and assess their credibility. - Be wary of biases and seek out multiple perspectives for a well-rounded understanding. - Be Aware of Biases: - Recognize common biases that can cloud judgment, such as confirmation or selection bias. - Actively counteract these biases by seeking disconfirming evidence and questioning assumptions. - Systematically Synthesize Information: - Organize your findings and analyze them methodically. - Use tools and frameworks to compare options based on defined criteria relevant to your project's goals. I encourage you to embrace this scientific approach in your daily work. The next time you're facing a critical decision - be it selecting a technology stack, debugging complex code, or planning a project - apply these principles: - Start with a Question: Clearly define what you need to find out. - Gather and Evaluate Information: Seek out reliable sources and scrutinize them. - Analyze Systematically: Organize your findings and look for patterns or insights. - Make Informed Decisions: Choose the path supported by evidence and sound reasoning. By doing so, you will enhance your problem-solving skills and contribute to a culture of thoughtful, evidence-based practice in the software engineering community. The best part is that once you start applying a critical and systematic approach to your sources of information, it becomes second nature. You'll automatically start asking questions like, "Where did this information come from?" "Is it reliable?" and "Can I reproduce the results?" Doing so will make you much less susceptible to hype, marketing, and new shiny things, ultimately making you happier and more confident. In the next part of this series, we'll look at applying the scientific mindset to debugging and using hypothesis testing and experimentation principles to solve problems more effectively....

Jan 10, 2025

14 mins

Software Engineering

Let's innovate together!

We're ready to be your trusted technical partners in your digital innovation journey.

Whether it's modernization or custom software solutions, our team of experts can guide you through best practices and how to build scalable, performant software that lasts.

Are you an AI Engineer? What is RAG? AI Implemented with Tracy Lee and Rob Ocel

You might also like

Dismantling Your AI Bias with Jerome Hardaway and Tracy Lee

How to build an AI assistant with OpenAI, Vercel AI SDK, and Ollama with Next.js

The 2025 Guide to JS Build Tools

The Importance of a Scientific Mindset in Software Engineering: Part 1 (Source Evaluation & Literature Review)

Let's innovate together!

You might also like

Dismantling Your AI Bias with Jerome Hardaway and Tracy Lee

How to build an AI assistant with OpenAI, Vercel AI SDK, and Ollama with Next.js

The 2025 Guide to JS Build Tools

The Importance of a Scientific Mindset in Software Engineering: Part 1 (Source Evaluation & Literature Review)