Submissions

Rootconf submissions for technical reviews

Receive feedback and visibility for your work

Tickets

Loading…

Accepting submissions

Not accepting submissions

Topics AI infrastructure and operations GPU Computing Multi-GPU scaling optimization techniques Memory optimization for large-scale training Distributed training and inference architectures Production deployment strategies expand

Topics

  • AI infrastructure and operations

    • GPU Computing

      • Multi-GPU scaling optimization techniques
      • Memory optimization for large-scale training
      • Distributed training and inference architectures
      • Production deployment strategies
    • MLOps

      • Inference optimization techniques
      • Deployment and Inference techniques on edge devices
      • Model performance monitoring at scale
      • Standardized deployment practices
  • CI/CD:

    • Case studies of CI/CD
    • Advances in CI/CD pipeline automation, including use of new tools and techniques to streamline deployments and reduce the time to market
  • Core OS engineering

  • Distributed systems

    • Testing for distributed systems
  • Cloud security

  • Container orchestration

  • Deployment architecture of control plane plus data plane - why your organization adopted this approach.

  • Detection engineering - detecting security issues and vulnerabilities in production

  • Experience reports from using open source and proprietary tools for security

  • Databases in production, including:

    • Why and how you arrived at the choice of which database to use?
    • Comparison of choices; trade-offs you have made for your architecture/use case
  • FinOps

  • High availability

  • Hardware architecture and optimization

    • Performance engineering

      • Microarchitecture deep-dives for modern processors (AMD and Intel’s new processors have been in news quite a lot)
      • Performance debugging with system profilers such as perf, BPFTrace
      • Architecture-specific optimization case studies
      • Understanding hardware-software interactions
    • AI hardware evolution

      • GPU architecture and capabilities
      • Custom accelerator deployments
      • Cost-performance analysis for AI workloads
      • Hardware selection strategies
  • Infrastructure engineering to manage and deploy complex systems

  • Kubernetes in production, with focus on real-world experiences and lessons from running Kubernetes in production,

    • Best practices for scaling
    • k8s security
    • k8s troubleshooting
    • Getting started with k8s, including technical challenges and/or decision challenges
  • Incident management, including:

    • Case studies of how you/your team did it
    • Documenting incident - how your team and you do it
    • Tools used for incident management and reporting
    • Root cause analysis
  • Multi-cloud and hybrid cloud strategies, including:

    • Case studies of managing and optimizing multi-cloud and hybrid cloud environments
    • Interoperability challenges and wins
    • Cost management
    • Security
    • Multi-cloud strategies to avoid vendor lock-in and improve redundancy
    • Complexity in management and security
  • Observability, including:

  • Best practices for observability in modern applications
  • Tools and techniques used for tracking what kind of metrics
  • Logging
  • Tracing to gain insights into system performance and reliability
  • Reliability

  • Resiliency, including:

  • Techniques for building and maintaining resilient systems
  • Chaos engineering - experience reports
  • Fault tolerance
  • Disaster recovery planning
  • Securing systems running in production - incorporating security practices earlier in the software development lifecycle, commonly referred to as “shifting left” in DevSecOps

  • Security for AI systems - tools, case studies.

  • Serverless architectures: costs, benefits (such as reducing operational overhead) and challenges (such as cold starts and monitoring).

  • AI security products - experience reports

  • Software delivery management

  • Software architecture, including:

  • Decision journeys
  • Case studies
  • Failure stories
  • System Internals for SRE and performance

    • Database Systems

      • Query execution and optimization
        • Storage engine internals
        • Performance tuning stories
        • High-throughput transaction processing
        • Vector databases: algorithms, internals, or performance
    • Programming language internals for better Observability and building high performance systems

      • New performance related changes and features in languages such as Python, Java, Rust
      • Runtime optimization techniques
      • Memory management patterns
      • Concurrency and parallelism
      • System-level programming languages: Rust, Zig, Mojo
    • Linux Internals

      • New features:
      • eBPF for system observability
      • Custom scheduling with sched_ext
      • io_uring developments and capabilities
  • Security:

    • Speculative execution vulnerabilities and their impact on systems performance
  • Testing and validation

    • Distributed Systems Testing
    • Deterministic Simulation Testing
    • Fault injection and chaos engineering
  • Workflow management using Temporal

(No sessions have been submitted)

Hybrid access (members only)

Hosted by

We care about site reliability, cloud costs, security and data privacy