Accepting submissions
Not accepting submissions
Topics
-
AI infrastructure and operations
-
GPU Computing
- Multi-GPU scaling optimization techniques
- Memory optimization for large-scale training
- Distributed training and inference architectures
- Production deployment strategies
-
MLOps
- Inference optimization techniques
- Deployment and Inference techniques on edge devices
- Model performance monitoring at scale
- Standardized deployment practices
-
-
CI/CD:
- Case studies of CI/CD
- Advances in CI/CD pipeline automation, including use of new tools and techniques to streamline deployments and reduce the time to market
-
Core OS engineering
-
Distributed systems
- Testing for distributed systems
-
Cloud security
-
Container orchestration
-
Deployment architecture of control plane plus data plane - why your organization adopted this approach.
-
Detection engineering - detecting security issues and vulnerabilities in production
-
Experience reports from using open source and proprietary tools for security
-
Databases in production, including:
- Why and how you arrived at the choice of which database to use?
- Comparison of choices; trade-offs you have made for your architecture/use case
-
FinOps
-
High availability
-
Hardware architecture and optimization
-
Performance engineering
- Microarchitecture deep-dives for modern processors (AMD and Intel’s new processors have been in news quite a lot)
- Performance debugging with system profilers such as perf, BPFTrace
- Architecture-specific optimization case studies
- Understanding hardware-software interactions
-
AI hardware evolution
- GPU architecture and capabilities
- Custom accelerator deployments
- Cost-performance analysis for AI workloads
- Hardware selection strategies
-
-
Infrastructure engineering to manage and deploy complex systems
-
Kubernetes in production, with focus on real-world experiences and lessons from running Kubernetes in production,
- Best practices for scaling
- k8s security
- k8s troubleshooting
- Getting started with k8s, including technical challenges and/or decision challenges
-
Incident management, including:
- Case studies of how you/your team did it
- Documenting incident - how your team and you do it
- Tools used for incident management and reporting
- Root cause analysis
-
Multi-cloud and hybrid cloud strategies, including:
- Case studies of managing and optimizing multi-cloud and hybrid cloud environments
- Interoperability challenges and wins
- Cost management
- Security
- Multi-cloud strategies to avoid vendor lock-in and improve redundancy
- Complexity in management and security
-
Observability, including:
- Best practices for observability in modern applications
- Tools and techniques used for tracking what kind of metrics
- Logging
- Tracing to gain insights into system performance and reliability
-
Reliability
-
Resiliency, including:
- Techniques for building and maintaining resilient systems
- Chaos engineering - experience reports
- Fault tolerance
- Disaster recovery planning
-
Securing systems running in production - incorporating security practices earlier in the software development lifecycle, commonly referred to as “shifting left” in DevSecOps
-
Security for AI systems - tools, case studies.
-
Serverless architectures: costs, benefits (such as reducing operational overhead) and challenges (such as cold starts and monitoring).
-
AI security products - experience reports
-
Software delivery management
-
Software architecture, including:
- Decision journeys
- Case studies
- Failure stories
-
System Internals for SRE and performance
-
Database Systems
- Query execution and optimization
- Storage engine internals
- Performance tuning stories
- High-throughput transaction processing
- Vector databases: algorithms, internals, or performance
- Query execution and optimization
-
Programming language internals for better Observability and building high performance systems
- New performance related changes and features in languages such as Python, Java, Rust
- Runtime optimization techniques
- Memory management patterns
- Concurrency and parallelism
- System-level programming languages: Rust, Zig, Mojo
-
Linux Internals
- New features:
- eBPF for system observability
- Custom scheduling with sched_ext
- io_uring developments and capabilities
-
-
Security:
- Speculative execution vulnerabilities and their impact on systems performance
-
Testing and validation
- Distributed Systems Testing
- Deterministic Simulation Testing
- Fault injection and chaos engineering
-
Workflow management using Temporal
(No sessions have been submitted) |