Cleaner Code, Happier ArchUnit: Extracting Nested Data Classes

Dec 5, 2025 by Admin 63 views

Hey everyone! Let's talk about something super important for keeping our codebase sparkling clean and our development process smooth: extracting nested data classes. You know, those little bits of technical debt that sometimes creep into our projects, making our architecture rules scream a little. We're on a mission to iron out some ArchUnit temporary exclusions and make our code truly shine. This isn't just about satisfying an automated test; it's about making our code more readable, maintainable, and ultimately, a joy to work with for everyone on the team.

Why Nested Data Classes Are an Architectural Headache

So, what's the big deal with nested data classes, you might ask? Well, guys, our ArchitectureTest.kt is currently carrying a temporaryExclusions set, which is basically a little note saying, "Hey, ignore these files for now, we know they break a rule." Specifically, it's bypassing the "one top-level class per file" architectural rule. This rule is a cornerstone of clean code practices, and for good reason. Imagine walking into a new codebase (or even your own after a long weekend!), and you're trying to find a specific data structure. If it's tucked away inside another service class, perhaps deeply nested, it's like trying to find a needle in a haystack. Readability takes a hit, and understanding the system's various components becomes unnecessarily difficult. When we keep data structures separate from the logic that operates on them, we create a much clearer separation of concerns. This design philosophy dramatically improves maintainability, making it easier to pinpoint issues, refactor safely, and onboard new team members. Violating this rule often leads to a tangled mess, where a single file can balloon into an unmanageable behemoth, hiding multiple concerns under one roof. By addressing these architectural shortcomings, we're not just fixing a linter warning; we're investing in the future health and scalability of our entire application.

Furthermore, having nested data classes means that if another part of the system needs to use that data structure, it has to import the entire outer class. This can lead to tighter coupling than necessary, where changes to the enclosing service might inadvertently affect other parts of the system that merely depend on its nested data structure. Think about it: a data class is essentially a definition of information. It's like a blueprint. Do you really want your blueprint for a small component to be embedded within the blueprint for an entire factory? Probably not! You want a clear, standalone blueprint that can be easily referenced and understood. This principle extends directly to our code. Our goal is to promote loose coupling and high cohesion, two fundamental pillars of robust software design. Separating these concerns makes our system more modular, allowing for easier testing, independent development, and clearer responsibilities. It's all about making our code more predictable and less prone to unexpected side effects, ultimately fostering a more resilient and adaptable system.

Our Current Situation: The ArchUnit Exclusion List

Right now, we've got a bit of a technical debt situation on our hands. The ArchitectureTest.kt file, our architectural guardian, has a temporaryExclusions set that is currently turning a blind eye to five specific service files. These files, listed below, are flagged for containing nested data classes that violate our "one top-level class per file" rule. While these exclusions allowed us to move forward quickly at some point, they represent a lingering architectural inconsistency that we need to tackle head-on. It's like having a leaky faucet that you've put a bucket under – it works for now, but it's not the permanent, clean solution we want.

InvestmentMetricsService.kt
EtfBreakdownService.kt
CalculationService.kt
InstrumentService.kt
LightyearHoldingsService.kt

These are important services, guys, and having them on an exclusion list means we're consciously allowing a deviation from our desired architectural standards. Over time, these small deviations can accumulate, making it harder to enforce rules consistently and understand the codebase's true structure. Removing these exclusions isn't just about passing a test; it's about elevating the quality of these core services and ensuring they align with our long-term vision for a clean, maintainable, and robust system. It's about saying goodbye to temporary fixes and embracing sustainable code health.

Identifying the Culprits: Nested Data Classes to Extract

Alright, let's get down to brass tacks and identify the specific nested data classes that are causing our ArchUnit tests to grumble. We've categorized them by priority, because, well, some issues are just a bit more urgent than others! The aim here is to pull these data structures out of their cozy nests within our service files and give them their own dedicated homes. This not only satisfies ArchUnit but, more importantly, makes our code much clearer and easier to manage. Think of it as decluttering a messy room – everything gets its proper place!

High Priority: Public Classes

These are the big ones, folks. The public nested data classes are a higher priority because they are often exposed or used in crucial interfaces, caches, or asynchronous operations. When a public class is nested, it means other parts of our application (or even external systems, in some cases) might be directly interacting with this nested structure. This creates a stronger coupling between the consumer and the containing service, which is generally something we want to avoid. Extracting these makes their contracts clearer and their usage more explicit, reducing potential side effects and improving overall system clarity. It’s like ensuring all your important public signs are standalone and easy to read, not hidden within a larger business report.

Let's dive into the specifics:

CalculationResult in CalculationService.kt: This data class, currently residing within CalculationService.kt, is a prime candidate for extraction. It spans lines 21-30 and, crucially, implements Serializable. This Serializable interface isn't just for show; it's a vital component that enables objects of this class to be converted into a stream of bytes, which is then often used for caching mechanisms (like Redis) or for transmitting data across networks. When CalculationResult is nested, it implicitly ties its lifecycle and its definition directly to CalculationService. While functional, this creates a less flexible and less discoverable data structure. Extracting CalculationResult.kt into its own file means that any component needing to store or retrieve calculation results from a cache can directly import CalculationResult without having to bring in CalculationService itself. This significantly reduces unnecessary dependencies and clarifies the data's independent role. Moreover, having a standalone CalculationResult improves code readability immensely. A developer looking to understand the output of a calculation service can immediately find its definition without scanning through service logic. This separation is a cornerstone of building highly maintainable and understandable applications, allowing us to evolve our caching strategies or calculation logic independently without fear of inadvertently breaking serialization or external consumers. This move ensures that our cache compatibility remains intact while boosting architectural clarity.
XirrCalculationResult in CalculationService.kt: Also found within CalculationService.kt, from lines 32-37, XirrCalculationResult is another public class that needs its own space. The note explicitly states it's returned from an async method. This means it's likely part of an asynchronous contract, potentially forming the payload of a Future or CompletableFuture. When a class is used as a return type for such methods, its definition becomes a critical part of the API contract. Nesting it within CalculationService obscures this contract, making it harder for consumers of the async method to immediately grasp the structure of the returned data. By moving XirrCalculationResult to its own XirrCalculationResult.kt file, we explicitly declare its role as a standalone data transfer object (DTO). This dramatically enhances the discoverability and clarity of our asynchronous APIs. Developers consuming these results can instantly locate the definition, understand its fields, and integrate it into their code with greater confidence. This refactoring doesn't just make ArchUnit happy; it makes our API contracts more robust and transparent, leading to less confusion and smoother integration points across our services. It's about providing a clear, unambiguous data blueprint for our async operations.
InstrumentEnrichmentContext in InstrumentService.kt: This data class, located in InstrumentService.kt (lines 24-28), is described as public and encapsulating enrichment parameters. This is a classic case where a data class acts as a dedicated container for a set of related inputs or parameters for a specific operation. When such a context class is nested, it forces any client code that needs to provide these enrichment parameters to depend directly on InstrumentService. This creates an artificial coupling between the client and the service, even if the client only cares about the context object itself. Extracting InstrumentEnrichmentContext into InstrumentEnrichmentContext.kt breaks this unnecessary bond. Now, clients can simply import InstrumentEnrichmentContext and construct it independently, then pass it to InstrumentService methods. This architectural change promotes better modularity and reusability. It means the definition of enrichment parameters is clear, independent, and easily discoverable. Furthermore, if the enrichment process itself were to evolve or be moved, the InstrumentEnrichmentContext could follow or be adapted without dragging InstrumentService along for the ride. This empowers us to design more flexible and adaptable systems, where core data definitions stand on their own.
HoldingKey in EtfBreakdownService.kt: Residing in EtfBreakdownService.kt (lines 29-33), HoldingKey is explicitly used as a Map key. This is a crucial detail because objects used as keys in maps (especially when dealing with custom objects) rely heavily on proper equals() and hashCode() implementations to ensure correct behavior. When HoldingKey is nested, its significance as a unique identifier is somewhat understated. By giving HoldingKey its own file, HoldingKey.kt, we elevate its status to a first-class data structure, emphasizing its role in uniquely identifying holdings. This separation makes it much clearer to developers that this is a fundamental building block for data organization. It also ensures that any future changes or considerations regarding how holdings are uniquely identified can be focused solely on HoldingKey.kt, rather than being intertwined with the business logic of EtfBreakdownService. This promotes a cleaner design where data definitions are distinct from the operations performed on that data, improving both readability and maintainability for everyone working with the ETF breakdown functionality.
HoldingValue in EtfBreakdownService.kt: Immediately following HoldingKey in EtfBreakdownService.kt (lines 35-39), HoldingValue serves as a Map value. Paired with HoldingKey, this data class represents the actual data associated with each unique holding. Just like its key counterpart, nesting HoldingValue within the service makes it less discoverable and couples its definition unnecessarily with the service's implementation details. Extracting HoldingValue to its own HoldingValue.kt file provides several benefits. Firstly, it visually clarifies that HoldingValue is a structured piece of data that describes a holding, independent of the EtfBreakdownService logic. Secondly, if the structure of HoldingValue needs to evolve (e.g., adding new fields or modifying existing ones), these changes can be made in its dedicated file without impacting the EtfBreakdownService's core functionality or forcing a recompilation of the service if only the data structure is modified. This separation fosters independent evolution of both data and logic. It’s a huge win for maintainability and allows for a clearer understanding of the data model underpinning our ETF breakdown features, ensuring that our data definitions are as robust and flexible as the services that use them.

Medium Priority: Private Classes

While these private nested data classes aren't exposed publicly, extracting them is still a significant step towards a cleaner architecture. Even private classes contribute to a file's complexity and can make testing more challenging if they’re deeply embedded. By pulling them out, we reduce the cognitive load when reading the main service file, making it easier to grasp its core responsibilities. Think of it as tidying up your private drawers – even if no one else sees them, a well-organized space just feels better and makes it easier for you to find what you need!

Let's get into these:

HoldingsAccumulator in InvestmentMetricsService.kt: This class (lines 47-64) is described as private and used in a fold operation. A HoldingsAccumulator sounds like a helper class specifically designed to aggregate or process data during a reduction operation, often within a stream or collection. While its private nature means it's not exposed externally, having a relatively substantial class (17 lines!) nested within InvestmentMetricsService.kt adds to the file's overall length and complexity. Extracting HoldingsAccumulator into its own HoldingsAccumulator.kt file immediately improves the readability of InvestmentMetricsService.kt. The service file can then focus purely on its high-level business logic, delegating the accumulation details to a dedicated, clearly named data class. This makes it easier to understand the InvestmentMetricsService's main purpose without getting bogged down in the intricacies of how data is accumulated. It also means if we ever need to test the accumulator logic in isolation or reuse it in a similar context, it's readily available as a standalone component. This refactoring contributes significantly to decoupling internal concerns and achieving a more modular and focused design within our services.
ValidatedRowData in LightyearHoldingsService.kt: Located in LightyearHoldingsService.kt (lines 90-95), ValidatedRowData is a private parsing helper. Parsing often involves intermediate data structures that hold partially processed or validated information. Nesting such a helper class within the main service file, especially when parsing logic can be complex, adds noise and reduces the clarity of the service's primary functions. By extracting ValidatedRowData into its own ValidatedRowData.kt file under a logical package like ee.tenman.portfolio.lightyear/, we achieve several benefits. Firstly, it makes the parsing process itself more transparent and modular. Developers can easily find the definition of what constitutes "validated row data." Secondly, it cleans up LightyearHoldingsService.kt, allowing it to concentrate on orchestrating the overall holdings processing rather than defining internal data structures for parsing. This separation makes the parsing logic more testable in isolation and enhances the overall readability of the service, ensuring that our internal helpers are as well-organized as our public interfaces.
BasicRowData in LightyearHoldingsService.kt: Also a private parsing helper in LightyearHoldingsService.kt (lines 122-125), BasicRowData likely serves a similar purpose to ValidatedRowData but perhaps at an earlier stage of the parsing pipeline. Even though it's a smaller class, its presence as a nested entity still detracts from the service's core focus. Extracting BasicRowData into BasicRowData.kt (again, within the ee.tenman.portfolio.lightyear/ package) contributes to the same goals as ValidatedRowData: improved modularity, enhanced readability, and clearer separation of concerns within the parsing logic. Having both ValidatedRowData and BasicRowData in their dedicated files helps to paint a clearer picture of the different stages and types of data involved in the LightyearHoldingsService's parsing responsibilities. This approach ensures that even internal helper structures are properly organized, making our codebase more robust and easier to navigate for anyone digging into the details of data ingestion and processing.

The Solution: A Path to Architectural Purity

Now that we've pinpointed the areas needing attention, let's talk about the solution. The path to architectural purity here is straightforward yet impactful: give each of these data classes its own dedicated file. This simple act dramatically improves our codebase's structure and makes ArchUnit, our architectural watchdog, much happier. By adhering to the "one top-level class per file" rule, we unlock a host of benefits that go beyond just passing tests; we build a more resilient, understandable, and scalable application. It's about empowering developers with a clear and consistent mental model of where everything lives.

New Files to Create

To achieve this, we'll be creating new files in a logical and organized manner. This ensures that our newly independent data classes are easily discoverable and fit naturally within our existing package structure. Following established conventions helps maintain clarity and consistency across the entire project.

src/main/kotlin/ee/tenman/portfolio/service/
├── CalculationResult.kt
├── XirrCalculationResult.kt
├── InstrumentEnrichmentContext.kt
├── HoldingKey.kt
├── HoldingValue.kt
└── HoldingsAccumulator.kt

src/main/kotlin/ee/tenman/portfolio/lightyear/
├── ValidatedRowData.kt
└── BasicRowData.kt

This structure reflects a clear separation, placing general service-related data classes in a common service package and Lightyear-specific parsing helpers in their designated lightyear package. This organization is key to future maintainability and helps new team members quickly grasp the project's layout.

Learning from the Best: Existing Patterns to Follow

Good news, folks! We're not reinventing the wheel here. Our codebase already has excellent examples of how properly extracted data classes should look and function. These existing patterns serve as a blueprint and confirmation that this refactoring aligns perfectly with our established best practices. By following these examples, we ensure consistency and leverage patterns that have already proven their worth. This makes the transition smooth and natural for everyone involved, reducing cognitive load and accelerating adoption.

We already have standalone data classes like:

InstrumentMetrics.kt
PortfolioMetrics.kt
TransactionState.kt
PriceChange.kt
InternalHoldingData.kt

These examples clearly demonstrate the benefits of dedicated files for data structures, making them easily discoverable, reusable, and testable. We're simply extending this proven approach to address our current technical debt, bringing more of our data definitions into the light.

What Success Looks Like: Acceptance Criteria

To ensure we've fully tackled this architectural task, we've laid out clear acceptance criteria. These points will guide our work and confirm that the refactoring is complete, robust, and doesn't introduce any regressions. Achieving these criteria means we've successfully cleaned up our code and reinforced our architectural standards.

[ ] Extract all 9 nested data classes to separate files: This is the core task. Every identified nested class must get its own .kt file.
[ ] Update imports in affected service files: Once classes are moved, all references in InvestmentMetricsService.kt, EtfBreakdownService.kt, CalculationService.kt, InstrumentService.kt, and LightyearHoldingsService.kt must be updated to reflect the new file locations.
[ ] Remove temporaryExclusions set from ArchitectureTest.kt: This is the ultimate goal. The ArchitectureTest should pass without any special waivers for these files.
[ ] All existing tests pass: No regressions! Our unit, integration, and end-to-end tests must all pass with flying colors, ensuring functionality remains unchanged.
[ ] ArchUnit architecture tests pass without exclusions: The final validation that our architecture is clean and compliant.

Navigating the Waters: Risk Assessment

Every refactoring, no matter how beneficial, comes with potential risks. However, we've carefully assessed the situation, and the good news is that for this specific task, the risks appear to be minimal and well-understood. Our analysis suggests that this refactoring is largely a structural change with a low probability of introducing functional bugs, thanks to the nature of data classes and the clear separation of concerns we are aiming for. We’ve considered the most common pitfalls and have confidence in our approach.

Serialization: CalculationResult has a serialVersionUID. This is super important because it ensures that objects serialized with an older version of the class can still be deserialized successfully with a newer one. The good news? Extraction is safe. Moving the class to its own file doesn't change its internal structure or how serialVersionUID functions. We'll simply be moving the definition, preserving its serialization compatibility. This means our caches and any persistence mechanisms relying on serialization will continue to work flawlessly.
Circular Dependencies: A common headache in refactoring is creating circular dependencies, where Class A depends on Class B, and Class B depends back on Class A. This can lead to compilation issues and complex, tightly coupled systems. Our assessment: None identified. All the data classes we're extracting are pure Data Transfer Objects (DTOs). They simply hold data and don't typically have business logic that would create dependencies back to the services that use them. This pure DTO nature makes the extraction process very low risk in terms of dependency cycles.
Cache Compatibility: We rely on caching for performance, and breaking that would be a nightmare! The fantastic news is that Redis cache keys remain unchanged. Our refactoring focuses on moving the definition of the data structures, not how they are used as keys or values in the cache. The actual data being stored and the keys used to retrieve it will be identical post-refactoring. This ensures no breaking changes to our caching layer, keeping our application speedy and efficient.

Why This Matters: The Big Picture

This refactoring isn't just about ticking boxes or silencing an ArchUnit warning, folks. It's about a much bigger picture – the long-term health, maintainability, and scalability of our entire portfolio application. Every line of code we write, every structure we define, contributes to the overall robustness (or fragility) of our system. By meticulously extracting these nested data classes, we're making a significant investment in our future.

Firstly, improved readability is a huge win. When data classes live in their own files, developers can instantly understand what data structures are available and what purpose they serve, without having to dig through service logic. This drastically reduces cognitive load, making it easier for new team members to onboard and for existing ones to navigate the codebase efficiently. Imagine a library where every book has its own distinct title and cover, rather than being a chapter hidden inside a larger, unrelated tome. That's the clarity we're aiming for.

Secondly, we're promoting better maintainability. Independent data classes mean that if the structure of CalculationResult needs to change, we modify only CalculationResult.kt. We don't risk inadvertently affecting CalculationService's logic unless it's directly related. This surgical precision in changes reduces the chance of introducing bugs and speeds up development cycles. It's about creating smaller, more manageable units that are easier to reason about and evolve.

Thirdly, this move enhances testability. When a data class is extracted, it becomes easier to create instances of it for unit testing purposes, both for the data class itself (if it has complex constructors or methods) and for the services that consume it. This leads to more focused and effective tests, which in turn leads to a more reliable application. We can ensure our data structures behave exactly as expected, in isolation, before integrating them into larger systems.

Finally, and perhaps most importantly, we are fortifying our architectural integrity. By removing temporaryExclusions from ArchitectureTest.kt, we are upholding our commitment to strong architectural principles. This consistency prevents architectural erosion, where small deviations eventually lead to a fragmented and difficult-to-manage system. It fosters a culture of discipline and attention to detail, which is invaluable for any growing project. A strong architectural foundation means our application is more resilient to change, easier to scale, and more predictable in its behavior. This refactoring is a critical step in ensuring our portfolio application remains a stellar example of well-crafted software.

Final Thoughts

So there you have it, folks! This refactoring task might seem small, but its impact on our codebase's health and our development efficiency is going to be huge. Let's get these nested data classes extracted, clean up those ArchUnit exclusions, and enjoy a more modular, readable, and maintainable portfolio application. Thanks for jumping on board and helping us make our code even better!