VII. FUTURE WORK
Given the diversity of threats and the variety of analysis
techniques, we highlight future work that can help bolster our
accomplishments.
A. Analyzing Other Modified Artifacts
Fork Sentry does not account for adversarial changes in
source code, where injected source can discreetly conduct
malicious behaviors. Catching such attacks will either involve
incorporating prior work done in identifying malicious source
code and commits or applying source-based diff-aware static
analysis.
Another area of detection can be in compilation manifests,
which establish metadata, dependencies, and runtime behavior
during installation. The Octopus Scanner malware [29] is an
example of this, where the Java build process for a popular
project is backdoored to install malware. Parsers for different
manifest formats can be incorporated to statically analyze
these configurations to recognize if malicious capabilities are
introduced.
B. Improving Similarity Analysis
Previous studies [30] have showcased how similarity hashes
like ssdeep that employ a sliding window technique on the
entirety of a file lack context in binary features that may be
critical to associating similar samples. As such, false negatives
may become prevalent in identifying samples that are compiled
and released, particularly if malware authors actively attempt
to evade similarity analysis.
When examining the effectiveness of our similarity anal-
ysis with the weaponized cryptominer propagated by the
persistent threat actor against all other samples detected in
pooler/cpuminer’s forks, we found that Fork Sentry fails
to recognize its similarity to original cpuminer binaries,
even though it is only lightly modified.
In addition, because our scope is currently limited to
cryptocurrency repositories, obtaining reliable ground truth
about whether a given repository is malicious can be difficult.
Although we require that cryptominers have additional sus-
picious indicators (described in Section IV-B) to be deemed
actually malicious and worth reporting to GitHub, we lack
sufficient context to determine if any of the unreported repos-
itories were, in fact, malicious.
To address this, we plan to investigate other similarity
clustering techniques, perform deeper analysis on the meaning
of the changes made in a fork repository, and broaden the
scope of our work to incorporate other repositories and source
code.
C. Expanding Target Scope
In this paper, we focused on analyzing cryptocurrency-
based repositories, a very small subset of all projects on
GitHub. This scope can be widely expanded for many other
open-sourced ecosystems. For instance, as demonstrated by
Enriksen’s [9] work, Go-based packages have been targets
of malware infection through typosquatted forks accidentally
used as dependencies. Popular offensive security projects are
also good candidates, since there may be malicious forks
hidden amongst projects meant for pedagogy or research.
Finally, we could further expand our scope to include all
popular repositories (using, e.g., Google’s BigQuery dataset
for GitHub repositories [31]).
AVAILABILITY
To help others build on our work, we have released the
code of our fork scanning infrastructure under an open source
license at:
https://github.com/ex0dus-0x/fork-sentry
Datasets containing the results of our initial fork scans,
binary similarity database, and analyses are available at:
https://zenodo.org/record/6391341
https://www.virustotal.com/gui/collection/f15433215537bc
3dea2e71718778ca70f8241228cd2418b5b7f21a3a729a34da
ACKNOWLEDGEMENTS
We would like to thank our anonymous reviewers for their
time and effort and for the valuable feedback they provided.
We also thank the NYU OSIRIS lab for the their support by
lending us computing infrastructure, which we used to run our
analyses.
REFERENCES
[1] C. Patterson. (2021) Github actions update: Helping maintainers combat
bad actors. [Online]. Available: https://github.blog/2021-04-22-github-
actions-update-helping-maintainers-combat-bad-actors/
[2] GitHub. (2021) Searching in forks. [Online]. Available: https://docs.git
hub.com/en/search-github/searching-on-github/searching-in-forks
[3] Malwarebytes Lab. (2019) Electrum bitcoin wallets under siege.
[Online]. Available: https://blog.malwarebytes.com/cybercrime/2019/04
/electrum-bitcoin-wallets-under-siege/
[4] Avast Threat Intelligence Team. (2021) Greedy cybercriminals host
malware on github. [Online]. Available: https://blog.avast.com/greedy-
cybercriminals-host-malware-on-github
[5] S. Pastrana and G. Suarez-Tangil, “A first look at the crypto-mining
malware ecosystem: A decade of unrestricted wealth,” in Proceedings
of the Internet Measurement Conference, 2019, pp. 73–86.
[6] GitHub. (2021) Github search. [Online]. Available: https://github.com/s
earch
[7] B. Kaplan and J. Qian, “A survey on common threats in npm and PyPI
registries,” in International Workshop on Deployable Machine Learning
for Security Defense, 2021, pp. 132–156.
[8] R. Duan, O. Alrawi, R. P. Kasturi, R. Elder, B. Saltaformaggio, and
W. Lee, “Towards measuring supply chain attacks on package managers
for interpreted languages,” in Network and Distributed Systems Sympo-
sium (NDSS), 2021.
[9] M. Henriksen. (2021) Finding evil Go packages. [Online]. Available:
https://michenriksen.com/blog/finding-evil-go-packages/
[10] M. O. F. Rokon, R. Islam, A. Darki, E. E. Papalexakis, and M. Faloutsos,
“Sourcefinder: Finding malware source-code from publicly available
repositories in github,” in 23rd International Symposium on Research
in Attacks, Intrusions and Defenses (RAID 2020), 2020, pp. 149–163.
[11] D. Gonzalez, T. Zimmermann, P. Godefroid, and M. Sch
¨
afer, “Anoma-
licious: Automated detection of anomalous and potentially malicious
commits on GitHub,” in IEEE/ACM 43rd International Conference on
Software Engineering: Software Engineering in Practice (ICSE-SEIP),
2021, pp. 258–267.
[12] W. La Cholter, M. Elder, and A. Stalick, “Windows malware binaries in
C/C++ GitHub repositories: Prevalence and lessons learned.” in ICISSP,
2021, pp. 475–484.
[13] L. Ren, S. Zhou, and C. K
¨
astner, “Poster: Forks insight: Providing
an overview of GitHub forks,” in 2018 IEEE/ACM 40th International
Conference on Software Engineering: Companion (ICSE-Companion),
2018, pp. 179–180.
7