Our forums are currently in maintenance mode and the ability to post is disabled. We will be back up and running as soon as possible. Thanks for your patience!

General discussion


Fastest and Smallest Database Engine

By o1j2m3 ·
First time post topic here, hope can get some valuable feedback.

Performance of reading and writing data is very important for a Database Engine.

Currently, most of the Database Engines are designed to process Tree/Map/Hash data, and they are using clustering, indexing and memory-processing to speed up the database I/O performance.

Hard disk I/O becomes the bottleneck. Usually, I feel headache on large database which contains at least 100GB of data. Analyzing old backup file becomes another nightmare to me.

I am just thinking, why there is no such a design where :
1. A data table as the core of a database which store unique data
2. All other indexes table point to the data table

Conceptually, this database engine design will greatly enhance on speed of reading and writing, because the database initially are indexed.

If normally, we need to use 100GB of hard disk space to store a database, with the new design, we only need 20GB of hard disk space.

The redundancy of data in a large database could be crazy. A piece of date or numeric data could occur more than 10 times or even more in database. The design will reduce the speed of insert and update data, but it greatly enhance on reading and maintenance performance.

The main thing is, it could be use in micro devices.

This conversation is currently closed to new comments.

Thread display: Collapse - | Expand +

All Comments

Collapse -

Without hashing

by Tony Hopkinson In reply to Fastest and Smallest Data ...

you have to reorder your index on any change. That's slower, requires a block move...

What would be the point of pointing to an integer, the pointer would be an integer! If you did do something like that, how would you know that Fred.Col13 and Bill.Col14, both currently 100 were always going to be the same. If they are it's the schema that's wrong not the engine. If they aren't then you are talking copyonwrite which is again slower.

If you want fast write and no search/find functionality, just bung the data into a file. If you want to read it though, you are going serial or implementing indexes of successivly complexity that have to be maintained.

You can't have super fast 'search' and write in data that is sourced in an unordered fashion. It can be implemented but there's little chnace that as generic functionality would be desirable.

Collapse -

Reindex with less block move

by o1j2m3 In reply to Without hashing

Thank you for your reply. I am glad to know some other concern.

Most database design involved tablespace, "room" or cluster for data. We can use this feature to make all the indexes loosely arranged and having some empty space between the pointers.

For example, 10 <5 null> 100 <5 null> 1000
So, we can insert new data, let's say 14 into the "null".
The index will be
10 <2 null> 14 <2 null> 100 <5 null> 1000

The "block move" only happen if between the pointer has no more null. This is to enhance the speed of writing.

For searching wise, we could have another pair of integer, which has weight (just like the vector). Using the weight, we could very quickly get the desirable data from unique data table. Searching using integer is faster than compare char, date, blob and clob.

We also could optimize the operation of "Select * from table where x like '%x%'", by searching through the unique data, then only match with index pointers. The operation could be much efficient in expectation.

This unique way of searching is also reduce the hardware bottleneck.

But that may be my wild ideas.

Collapse -

I may be misunderstanding but you appear to be

by Tony Hopkinson In reply to Reindex with less block m ...

contradicting yourself.

Hashing slows down read speed, in favour of write performance.

It appeared you were saying you would get quicker writes without it.

The extra weighted integers to improve reading would cost when writing, they have to be generated and maintained.

It's always been true that if you optimise for read, or search, or write or anything else, you de-optimise somewhere else.

Two of the accepted ways to improve disk IO are compression and defragmenation.

Compression is an increase in complexity and impacts overall IO speed. Defragmentation requires spare IO and processor time in order to have room to operate in. But if you have the spare time, why are you optimising in the first place?

Nothing wrong with your idea, but it's more optinising for a specific usage, scaling such as a micro database is almost itrrelevant. You have a small drive and small processor after all. If you use up the processor to cope with the small drive, you have less processor time for other things.

So I'm saying if you optimise for a specific usage pattern, what you end up with will be great at that, and near useless at anything different.

You seem to be thinking along the lines that because this is the limited resource, all others can be treated as effectively unlimited.

Collapse -

LEXST is the fastest database engine exactly as you described

by yassola In reply to I may be misunderstanding ...

<a href=http://www.lexst.com>Lexst database engine</a> searches index first instead of table content first, it can also be scaled to very large cluster (20000 nodes maximum).

LESXT also solved the bottleneck problem of huge amount concurrent search.

Related Discussions

Related Forums