Blame optimization-guide/sl/harmful.page
Branch: 1470ead63cdc56c84b491edf46f35990ec57fc33
Packit
1470ea
Packit
1470ea
<page xmlns="http://projectmallard.org/1.0/" type="guide" style="task" id="harmful" xml:lang="sl">
Packit
1470ea
<info>
Packit
1470ea
<link type="guide" xref="index#harm"/>
Packit
1470ea
</info>
Packit
1470ea
<title>Disk Seeks Considered Harmful</title>
Packit
1470ea
Packit
1470ea
Disk seeks are one of the most expensive operations you can possibly perform. You might not know this from looking at how many of them we perform, but trust me, they are. Consequently, please refrain from the following suboptimal behavior:
Packit
1470ea
Packit
1470ea
<list type="unordered">
Packit
1470ea
<item>
Packit
1470ea
Packit
1470ea
Placing lots of small files all over the disk.
Packit
1470ea
Packit
1470ea
</item>
Packit
1470ea
<item>
Packit
1470ea
Packit
1470ea
Opening, stating, and reading lots of files all over the disk
Packit
1470ea
Packit
1470ea
</item>
Packit
1470ea
<item>
Packit
1470ea
Packit
1470ea
Doing the above on files that are laid out at different times, so as to ensure that they are fragmented and cause even more seeking.
Packit
1470ea
Packit
1470ea
</item>
Packit
1470ea
<item>
Packit
1470ea
Packit
1470ea
Doing the above on files that are in different directories, so as to ensure that they are in different cylinder groups and cause even more seeking.
Packit
1470ea
Packit
1470ea
</item>
Packit
1470ea
<item>
Packit
1470ea
Packit
1470ea
Repeatedly doing the above when it only needs to be done once.
Packit
1470ea
Packit
1470ea
</item>
Packit
1470ea
</list>
Packit
1470ea
Packit
1470ea
Ways in which you can optimize your code to be seek-friendly:
Packit
1470ea
Packit
1470ea
<list type="unordered">
Packit
1470ea
<item>
Packit
1470ea
Packit
1470ea
Consolidate data into a single file.
Packit
1470ea
Packit
1470ea
</item>
Packit
1470ea
<item>
Packit
1470ea
Packit
1470ea
Keep data together in the same directory.
Packit
1470ea
Packit
1470ea
</item>
Packit
1470ea
<item>
Packit
1470ea
Packit
1470ea
Cache data so as to not need to reread constantly.
Packit
1470ea
Packit
1470ea
</item>
Packit
1470ea
<item>
Packit
1470ea
Packit
1470ea
Share data so as not to have to reread it from disk when each application loads.
Packit
1470ea
Packit
1470ea
</item>
Packit
1470ea
<item>
Packit
1470ea
Packit
1470ea
Consider caching all of the data in a single binary file that is properly aligned and can be mmaped.
Packit
1470ea
Packit
1470ea
</item>
Packit
1470ea
</list>
Packit
1470ea
Packit
1470ea
The trouble with disk seeks are compounded for reads, which is unfortunately what we are doing. Remember, reads are generally synchronous while writes are asynchronous. This only compounds the problem, serializing each read, and contributing to program latency.
Packit
1470ea
Packit
1470ea
</page>