From User Stories to Knowledge Graphs: Automated Requirements Engineering with LLMs

Abstract

Traditional agile development relies on user stories written in natural language, but analyzing complex product backlogs for dependencies, conflicts, and relationships remains a manual, error-prone process. This whitepaper presents a practical workflow for automated knowledge graph extraction from user stories using Large Language Models (LLMs), based on recent research advances in the field. We demonstrate how the UserStoryGraphTransformer research principles can be applied using n8n workflow automation and Neo4j graph databases to create structured JSON output suitable for requirements engineering. While full transformer implementation remains future work, our approach validates the core extraction methodology and provides a foundation for production deployment.

1. Introduction

Agile software development has standardized around user stories as the primary mechanism for capturing requirements. These stories follow the familiar template: "As a [persona], I want [what], so that [why]." While this format effectively communicates individual requirements, it creates significant challenges when managing complex product backlogs containing hundreds or thousands of interdependent stories.

Traditional requirements engineering approaches struggle with several key problems:

Limited system perspective: Individual user stories provide isolated views of functionality
Hidden dependencies: Relationships between requirements are not explicit in textual form
Scalability challenges: Manual analysis becomes impractical for large backlogs
Inconsistency detection: Conflicts and redundancies are difficult to identify systematically
Traceability gaps: Connecting requirements to business objectives requires manual effort

Recent advances in Large Language Models (LLMs) and knowledge graph technologies offer a solution to these challenges through automated extraction and representation of requirements relationships.

2. Related Work

2.1 Knowledge Graph Extraction from User Stories

Silva (2024) introduced the UserStoryGraphTransformer (USGT) framework in "Extracting Knowledge Graphs from User Stories using LangChain" [arXiv:2506.11020]. This seminal work demonstrated that LLMs can systematically extract structured knowledge representations from natural language user stories, addressing limitations of previous NLP approaches.

The research established several key principles:

Dual-prompt architecture: Separate prompts for entity/relationship extraction and benefit analysis
Domain-specific entity types: Actor, Action, Entity, Benefit classifications rather than generic categories
Semantic relationship mapping: Relationships that capture user story grammar (PERFORMS, ACHIEVES, REQUIRES)
Model-agnostic approach: Framework supporting multiple LLM providers

Silva's evaluation against existing methods showed significant improvements over tools like Visual Narrator, with superior accuracy in entity recognition and relationship extraction when validated against an annotated dataset of 1,459 user stories.

2.2 Requirements Engineering Knowledge Graphs

Knowledge graphs have emerged as a powerful tool for requirements engineering, providing structured representations that reveal relationships not obvious in textual form (Silva, 2024). These graphs enable:

Systematic dependency analysis
Conflict and redundancy detection
Traceability management
Stakeholder communication through visualization
Automated quality checks

The application of knowledge graphs to agile development addresses the fundamental challenge where individual user stories provide limited system perspective, enabling more sophisticated requirements analysis and decision support.

3. Implementation Approach

3.1 Current Implementation Scope

Our current work focuses on validating the extraction methodology using n8n's Information Extractor node with enhanced prompts and JSON schema based on Silva's research. This approach demonstrates the viability of the UserStoryGraphTransformer principles without requiring custom LangChain component development.

What we implemented:

Enhanced JSON schema based on arXiv:2506.11020 research
Optimized system prompts for user story analysis
n8n workflow for automated extraction
Neo4j transformation and ingestion patterns

Future work:

Full UserStoryGraphTransformer LangChain component
Dual-prompt architecture implementation
Model-agnostic framework with multiple LLM providers
Advanced relationship inference and enrichment

3.2 Technology Stack

Our validation implementation leverages:

n8n: Workflow automation platform for orchestrating the extraction pipeline
LLM Integration: Information Extractor node supporting multiple model providers
Neo4j: Graph database for persistent storage and querying of requirements knowledge

This architecture provides a production-ready solution that can process user stories at scale while maintaining data quality and providing rich querying capabilities.

3.2 Enhanced Schema Design

Building on Silva's research, we developed an enhanced JSON schema that captures both the semantic structure of user stories and metadata essential for requirements engineering:

{
  "entities": [
    {
      "name": "string",
      "type": "Actor|Action|Entity|Benefit|System|Constraint",
      "properties": {
        "description": "string",
        "story_component": "who|what|why",
        "priority": "Low|Medium|High|Critical",
        "complexity": "Simple|Medium|Complex", 
        "domain": "string"
      }
    }
  ],
  "relationships": [
    {
      "source": "string",
      "target": "string", 
      "type": "PERFORMS|DESIRES|ACHIEVES|ENABLES|REQUIRES|INTERACTS_WITH",
      "properties": {
        "strength": "Weak|Medium|Strong",
        "dependency_type": "Optional|Required|Blocking",
        "story_connection": "string"
      }
    }
  ],
  "story_metadata": {
    "epic": "string",
    "acceptance_criteria": ["string"],
    "story_points": "number"
  }
}

3.3 Entity Classification System

The enhanced entity types provide semantic precision aligned with user story structure:

Actor: Users, roles, stakeholders (the "who")
Action: What the user wants to accomplish (the "what")
Entity: Objects, data, or resources involved
Benefit: The value or outcome desired (the "why")
System: Technical components or interfaces
Constraint: Limitations or requirements

3.4 Relationship Taxonomy

Relationships capture the semantic connections within and between user stories:

PERFORMS: Actor performs Action
DESIRES: Actor desires Benefit
ACHIEVES: Action achieves Benefit
ENABLES: Entity enables Action
REQUIRES: Action requires Entity/System
INTERACTS_WITH: Actor interacts with System/Entity

4. Practical Implementation

4.1 n8n Workflow Configuration

The extraction pipeline consists of several n8n nodes configured as follows:

Manual Trigger Node:

{
  "user_story": "As a user, I want to customize the layout of my dashboard so that I can arrange information in a way that suits me."
}

Information Extractor Node Configuration:

Model: GPT-4o-mini (or preferred LLM)
Temperature: 0.1 for consistent structured output
System Message: Custom prompt based on UserStoryGraphTransformer research
Schema: Enhanced JSON schema as defined above

Transformation Function:

const output = $input.all()[0].json.output;

// Transform entities with enhanced properties
const entities = output.entities.map(entity => ({
  name: entity.name,
  type: entity.type,
  observations: [
    entity.properties.description,
    `Story component: ${entity.properties.story_component}`,
    `Domain: ${entity.properties.domain}`,
    `Priority: ${entity.properties.priority}`,
    `Complexity: ${entity.properties.complexity}`
  ]
}));

// Transform relationships with context
const relations = output.relationships.map(rel => ({
  source: rel.source,
  target: rel.target,
  relationType: rel.type,
  properties: rel.properties
}));

4.2 Neo4j Integration

The transformed data is ingested into Neo4j using two operations:

Entity Creation: neo4j:create_entities with enhanced observations
Relationship Creation: neo4j:create_relations with properties

This creates a queryable graph where requirements can be analyzed using Cypher queries.

5. Validation Example: Dashboard Customization Story

5.1 Input User Story

"As a user, I want to customize the layout of my dashboard so that I can arrange information in a way that suits me."

5.2 Complete LLM Output

The n8n Information Extractor with our enhanced schema produced the following structured JSON:

{
  "output": {
    "entities": [
      {
        "name": "User",
        "type": "Actor",
        "properties": {
          "description": "A user who interacts with the system",
          "story_component": "who",
          "priority": "High",
          "complexity": "Simple",
          "domain": "User Interaction"
        }
      },
      {
        "name": "Customize Layout",
        "type": "Action",
        "properties": {
          "description": "The action of modifying the dashboard layout",
          "story_component": "what",
          "priority": "High",
          "complexity": "Medium",
          "domain": "Dashboard Configuration"
        }
      },
      {
        "name": "Dashboard Layout",
        "type": "Entity",
        "properties": {
          "description": "The arrangement of elements on the user's dashboard",
          "story_component": "what",
          "priority": "High",
          "complexity": "Medium",
          "domain": "Dashboard"
        }
      },
      {
        "name": "Arrange Information Suitably",
        "type": "Benefit",
        "properties": {
          "description": "Ability to organize information in a way that suits the user",
          "story_component": "why",
          "priority": "High",
          "complexity": "Simple",
          "domain": "User Experience"
        }
      }
    ],
    "relationships": [
      {
        "source": "User",
        "target": "Customize Layout",
        "type": "PERFORMS",
        "properties": {
          "strength": "Strong",
          "dependency_type": "Required",
          "story_connection": "The user performs the action of customizing the layout"
        }
      },
      {
        "source": "Customize Layout",
        "target": "Arrange Information Suitably",
        "type": "ACHIEVES",
        "properties": {
          "strength": "Strong",
          "dependency_type": "Required",
          "story_connection": "Customizing the layout achieves the benefit of arranging information suitably"
        }
      },
      {
        "source": "Customize Layout",
        "target": "Dashboard Layout",
        "type": "REQUIRES",
        "properties": {
          "strength": "Strong",
          "dependency_type": "Required",
          "story_connection": "Customizing layout requires the dashboard layout entity"
        }
      }
    ],
    "story_metadata": {
      "epic": "Dashboard Personalization",
      "acceptance_criteria": [
        "User can modify the arrangement of dashboard elements",
        "Changes to layout are saved and persist across sessions",
        "User can arrange information in a way that suits their preferences"
      ],
      "story_points": 5
    }
  }
}

5.3 Analysis of Results

This extraction demonstrates several key capabilities of the enhanced approach:

Semantic precision: Clear distinction between Actor, Action, Entity, and Benefit
Story component mapping: Explicit connection to who/what/why structure
Relationship semantics: Meaningful connections using domain-appropriate verbs
Metadata enrichment: Automatic generation of epic classification and acceptance criteria
Domain categorization: Functional grouping enabling cross-story analysis

6. Benefits and Applications

6.1 Requirements Analysis

The knowledge graph representation enables sophisticated analysis:

Dependency Analysis:

MATCH (story:UserStory)-[:CONTAINS]->(action:Action)-[r:REQUIRES]->(entity)
RETURN story.name, action.name, r.dependency_type, entity.name

Epic Organization:

MATCH (s:UserStory) 
WHERE s.observations CONTAINS 'Epic: Dashboard'
RETURN s.name, s.observations

Cross-functional Impact:

MATCH (actor:Actor)-[:PERFORMS]->(action:Action)-[:REQUIRES]->(system:System)
RETURN actor.name, action.name, system.name

6.2 Quality Assurance

The structured representation enables automated quality checks:

Completeness validation: Ensure all stories have Actor, Action, and Benefit
Consistency checking: Identify conflicting requirements across stories
Dependency mapping: Visualize prerequisite relationships
Coverage analysis: Verify all system components are addressed

6.3 Stakeholder Communication

Knowledge graphs provide intuitive visualizations that facilitate:

Product owner discussions about feature relationships
Development team understanding of system dependencies
Stakeholder alignment on epic scope and priorities
Transparent communication of technical constraints

7. Scalability and Performance

7.1 Batch Processing

The n8n workflow can be extended for batch processing multiple stories:

{
  "user_stories": [
    {"id": "US001", "story": "As a user..."},
    {"id": "US002", "story": "As a developer..."},
    {"id": "US003", "story": "As a manager..."}
  ]
}

7.2 Incremental Updates

Neo4j's graph structure supports incremental updates as requirements evolve:

Add new stories without rebuilding entire graph
Update relationships when dependencies change
Track requirement evolution over time
Maintain history for audit purposes

7.3 Integration Patterns

The architecture supports integration with existing tools:

Jira/Azure DevOps: Import stories from existing backlogs
Confluence: Export visualizations for documentation
CI/CD pipelines: Automated dependency checking
Analytics platforms: Requirements metrics and reporting

8. Limitations and Future Work

8.1 Current Limitations

LLM dependency: Quality depends on model capabilities and prompt engineering
Domain specificity: May require customization for highly specialized domains
Ambiguity handling: Complex or poorly written stories may produce inconsistent results
Scale validation: Needs testing with larger, more complex product backlogs

8.2 Future Research and Development

Immediate Next Steps:

UserStoryGraphTransformer Implementation: Develop full LangChain component based on Silva's dual-prompt architecture
Model Comparison Study: Evaluate extraction quality across different LLM providers
Batch Processing Optimization: Scale testing with larger story collections
Advanced Relationship Inference: Implement cross-story dependency detection

Medium-term Directions:

Multi-story Analysis: Detecting patterns and conflicts across story collections
Temporal Evolution: Tracking how requirements change over time
Predictive Analytics: Using graph patterns to forecast development complexity
Integration Ecosystem: Connectors for major project management platforms

Long-term Vision:

Automated Story Generation: Using knowledge graphs to suggest missing requirements
AI-Assisted Requirements Review: Automated quality assurance and completeness checking
Intelligent Backlog Management: AI-driven prioritization and planning recommendations

9. Conclusion

The validation of LLM-powered extraction principles demonstrates significant potential for transforming requirements engineering. By applying Silva's UserStoryGraphTransformer research through practical workflow tools, we have shown that teams can:

Extract structured, semantically rich representations from natural language user stories
Generate comprehensive metadata including epics, acceptance criteria, and complexity estimates
Create foundation data suitable for knowledge graph construction and analysis
Establish practical workflows that can be immediately adopted by development teams

Current State: Our work validates the core extraction methodology using n8n and enhanced JSON schemas, producing high-quality structured output that captures the semantic relationships within user stories.

Next Steps: Full implementation of the UserStoryGraphTransformer LangChain component will enable advanced features like dual-prompt architecture, model-agnostic processing, and sophisticated relationship inference.

This approach represents a significant step toward automated, systematic requirements analysis. While the complete transformer implementation remains future work, our validation demonstrates that the underlying principles are sound and can be operationalized using existing tools. As we progress toward full implementation, we expect this methodology to fundamentally change how teams manage and analyze requirements in agile development environments.

The structured JSON output and Neo4j integration patterns presented here provide a production-ready foundation that teams can adopt immediately, with clear evolution paths toward more sophisticated AI-assisted requirements engineering capabilities.

References

Silva, T. C. (2024). Extracting Knowledge Graphs from User Stories using LangChain. arXiv preprint arXiv:2506.11020. Brandenburg Technical University Cottbus.

Appendix A: Complete n8n Workflow Configuration

[Detailed JSON configuration for the complete n8n workflow, including all node settings, transformations, and Neo4j integration steps]

Appendix B: Sample Cypher Queries

[Collection of useful Cypher queries for requirements analysis, dependency tracking, and quality assurance]

Appendix C: Integration Examples

[Code examples for integrating with popular project management tools and development platforms]

magnus919/user-story-2-graph.md