Skip to content

Instantly share code, notes, and snippets.

@yegor256
Last active August 6, 2025 05:17
Show Gist options
  • Save yegor256/946c31490a72a89ef043937c9f680258 to your computer and use it in GitHub Desktop.
Save yegor256/946c31490a72a89ef043937c9f680258 to your computer and use it in GitHub Desktop.
student-projects.md

If you are a student, I would be glad to supervise your diploma or course project on one of the following topics (in no particular order). You may also find additional topics here: PainOfOOP, SQM, PMBA, OSBP.

Type Inference for EO:
There are no types in EO, an experimental object-oriented programming language. It would be interesting to create a type inference subsystem, which would "guess" types of objects (we call them "formas") with high enough precision. Similar type inference systems exist, for example, for Python (pytype) and JavaScript (flow).
Complexity: High; Effort: High; Stack: Java

Object-Oriented Benchmark for JVM:
SPECjbb 2015 is a de facto industry standard for benchmarking performance of Java Virtual Machines (JVM). However, there are two problems with this benchmark: 1) it's not open source and 2) it's rather old. On top of this, it's not focused on object-oriented features of JVM. A few other available benchmarks also don't have specialized tests for object allocation and dynamic dispatch, including DaCapo and Renaissance. We can create a new benchmark (a collection of Java classes) that would help testing the performance of a JVM with a strong focus on its object-oriented features. Then, we can test a dozen of available JVMs against the benchmark and publish our results.

Evaluation of Performance of malloc() in Different Operating Systems:
Heap is the primary storage for variable-sized memory blocks in modern operating systems and virtual machines. Allocating a slice of bytes in the heap and then releasing it back is a time-consuming operation, requiring several hundred CPU cycles. However, the exact number of cycles it takes to allocate and free memory chunks in different virtual machines and OSs remains unclear, even though some folklore studies exist. We suggest studying this subject, performing experiments on a sufficiently large number of testing platforms, summarizing and analyzing the results, and then publishing the results. Such an analysis might assist creators of programming languages and compilers in making better design decisions.
Complexity: High; Effort: High; Stack: C++

Study of Data Presence in Design Patterns:
In object-oriented programming, many design patterns are recommended for use. It’s commonly believed that if programmers adhere to these patterns in their code, the code quality will improve due to clearer design. We hypothesize that most design patterns emphasize the encapsulation of behaviors rather than data. In other words, the objects participating in design patterns are typically “dataless” objects. We propose studying this subject through a Systematic Literature Review (SLR) of existing literature on design patterns, aiming to either confirm or refute our hypothesis. The results of our research might be useful for compiler and programming language designers, prompting them to treat objects differently if they are dataless.
Complexity: Low; Effort: Middle; Stack: Java

Query Optimizations in Factbase:
Factbase is a open source NoSQL database engine, implemented in Ruby, with Lisp-ish query interface. Its indexing and caching mechanisms are rather naive and suboptimal: some queries take too long to execute even on small datasets. We need to invent and implement better runtime optimizations.
Complexity: High; Effort: High; Stack: Ruby

Inspector for EO:
EO is a compiled object-oriented programming language. We need to create a runtime debugger/inspector for it. The backend of the inspector must be written in Java, while its front-end in JavaScript. We've started implementing this feature earlier, but didn't manage to finish so far: objectionary/eoc#500.
Complexity: Middle; Effort: Middle; Stack: Java, JavaScript

JSmith:
CSmith is a famous tool for random C code generation. We want to create a similar tool, but for Java language. It should be a command line open source tool, generating Java code according to provided configuration params. We already created a draft (you will have to implement its entire functionality).
Complexity: Middle; Effort: Low; Stack: Java

EOdoc:
EO is an experimental object-oriented programming language. We need to create an automated documentation generator for it, which could read source code files and generate HTML pages with the documentation, similar to how javadoc and rustdoc work. We have already started doing this, but didn't finish: objectionary/eoc#467.
Complexity: Low; Effort: Low; Stack: JavaScript

EOfmt:
It would be nice to create an auto-formatter for the EO programming language. Similar to rustfmt, it should take an .eo file and reformat it.
Complexity: Low; Effort: Low; Stack: JavaScript

0PDD:
0PDD is a hosted open source GitHub chatbot that helps programmers decompose their tasks on the fly, via special TODO markers in their source code, known as "puzzles". We need to make it use Machine Learning in order to prioritize the backlogs of GitHub repositories. You will need to use Ruby and its existing ML frameworks, most probably rumale.
Complexity: High; Effort: Middle; Stack: Ruby

Port Xembly to JavaScript/Python/C++/C#:
Xembly, created in 2013, is an imperative language for XML manipulations and a Java library that implements the language. Currently, there are only two implementations of Xembly: in Java and Ruby. It would be great to port it to a few other programming languages, like C#, C++, Python, JavaScript, Go, and maybe others.
Complexity: Moderate; Effort: Middle; Stack: Any

Simple-XSL:
XSLT is a functional language for XML transformations. It is widely used in web development, ETL pipelines, and even in language design (EO-to-Java compiler is written in XSLT). However, the complexity of the language is often a barrier for programmers not familiar with XML. We can create a new language with exactly the same grammar and semantics as XSLT 3.0, but with a simpler syntax. A good example of such an approach is HAML, which is a simplified version of HTML.
Complexity: Middle; Effort: High; Stack: any

IntelliJ IDEA Plugin for EO:
A few years ago, we've created an EO plugin for IntelliJ IDEA. It highlights the syntax of EO and helps detect syntax errors. However, its functionality is pretty limited: it doesn't support code completion, compilation, debugging, and many other features a user would expect. It would be nice to improve the plugin and release an updated version with all the needed features.
Complexity: Middle; Effort: Middle; Stack: Java

CTFE for EO:
EO is an object-oriented programming language, where everything is an object, including arithmetic operators. It would be interesting to create a Compile-time_function_executor as a command line tool. It would take an EO program and replace some of its expressions with constants. For example, 2.plus 2 would be replaced with the 4 literal, thus making the program faster.
Complexity: Middle; Effort: Low; Stack: Java

XSL linter:
In our projects, we have many XSL documents. They are essentially XML documents and we validate their formatting using xcop. However, we don't use any XSL static analyzer. The problem is that there is none on the market. Only a prototype exists, which is not so easy to use. We can take this prototype, use it as a basis, and create our own XSL linter.
Complexity: Middle; Effort: Middle; Stack: Java

Goto Eliminator for C/C++:
There is a goto statement in C/C++ programming languages, which can be used freely in any place of the code. The existence of goto statements often makes code impossible to translate to EO, even though we have goto object over there. It would be nice to have an automated tool, which would take C/C++ code as an input and generate a new C/C++ code without goto statements. This paper may be relevant.
Complexity: High; Effort: High; Stack: any

EO Runtime Libraries:
At the moment, EO programming language has a very limited set of objects in its runtime library. It would be nice to add more objects in order to enable more convenient usage of the language. The following groups of objects are of the highest necessity:

  • eo-dom: to build and traverse XML documents, using DOM and XPath
  • eo-sax: to scan XML documents, using SAX
  • eo-http: to send and receive HTTP
  • eo-time: to parse, print, and manage date/time
  • eo-exec: to interact with the command line (exec and spawn)
  • eo-unicode: to manage UTF-8/12 strings
  • eo-base64: to encode and decode Base64/Base58 strings
  • eo-sprintf: to simulate sprintf
  • eo-pgsql: to interact with PostgreSQL
  • eo-xembly: to modify XML using Xembly language
  • eo-json: to parse and print JSON documents

Complexity: Middle; Effort: Middle; Stack: Java

Graph Algorithms in EO:
We have a simple benchmark of three programming languages (C++, Java, and EO) in this repository. However, only one algorithm is implemented in EO. It would be great to implement the other three algorithms too and then run the benchmark. Then, it would be interesting to summarize the results and publish an academic paper about it.
Complexity: Middle; Effort: Middle; Stack: Java, C++

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment