Talk:Entity–attribute–value model

Latest comment: 1 month ago by 2600:8807:CC0:EE30:31E9:9BA9:427B:5349 in topic recurring changes unknown too device

Similar models edit

Feature structure is a similar concept based on math formulas... Similarly there is Tuple space and at last quadstore and triple store. What is the difference between all those things? i⋅am⋅amz3 (talk)

How about the Drawbacks? edit

Maps very poorly to the relational model,

The primary objective of the schema is to map "real life" concepts (i.e. heterogenous sparsely populated attributes; what you'd find in a patient's chart) instead of coercing "real life" into a relational model. You are still able to leverage relational concepts because of the methods you need to use to pivot, join, as well as perform localized normalization. Cowbert (talk) 01:54, 2 May 2008 (UTC)Reply

Does not normalize well, Incurs overhead because it often requires data be implicitly or explicitly converted to/from character format, Many implementations waste space by providing several columns of different data types to allow for the different data types a value might be, You have to manually re-create a lot of the functionality built-in to a relational database, such as check constraints and foreign key constraints.

That is why in production you spin off EAV tables based on data types. EAV/CR directly addresses this issue: either the class table or the attribute table (if you are not using classes) stores metadata about which EAV table has all of those attribute's values. So you'd have an EAV_int, EAV_real, EAV_char, EAV_date etc. tables. When you enumerate the attribute for the entity, your business logic queries this metadata to determine which eav table to query for the value (example, patient_hemoglobin has a metadata column of 'eav_table' pointing to eav_real). Also, the constraints are now primarily metadata driven, instead of defined in the schema itself - again the primary goal is avoid having to coerce a certain representational system into a strict relational schema, where as the arguments that seem to be made are that "we must stick with relational schema" even though the whole point of the exercise is that a column-moduled schema poorly supports large and/or dynamic numbers of sparsely populated attributes (I can do a SELECT attribid, value FROM eavtable where entityid = foo vs. tossing all the null fields if I have a table where the attributes are column modeled [or worse, multiple tables of tons of sparse columns each that I'm going to have to index]). Finally, it has been admitted that the main difficulty more or less lies with attribute-centric queries (i.e. I *do* want to know every patient who is running a fever today, which brings up the necessity to pivot the attribute over into a columnar model or do some self-joins)) Cowbert (talk) 01:54, 2 May 2008 (UTC)Reply

The fact that vendors impose "limitations" and design databases poorly is not an advantage. By the way, the example given in the article is very EASILY converted to a properly normalized relational model. Perhaps someone should come up with a better example that is NOT easily converted to a normalized relational model? As an example:

Patient Table:
[Primary Key is (Patient_ID)]
Patient_ID
Patient_Name


Patient_Test Table:
[Primary Key is (Patient_ID, Test_ID)]
Patient_ID
Test_ID
Date
Test_Type
Quantity
Unit_of_Measure

Test_Type would be related to another table containing the various tests that could be performed, and Unit_of_Measure could relate to another table with UOM identifiers and possibly a conversion factor for a common UOM.

Patient
Patient_IDPatient_Name
1Patient XYZ
Patient_Test
Patient_ID Test_ID Date Test_Type Quantity Unit_of_Measure
1 1 1/5/1998 Hemoglobin 12.5 gm/dl
1 2 1/5/1998 Potassium 4.9 Meq/L


This is how the normalized eav schema of the above would look like:

table test_events {
eventid integer primary key,
patientid integer foreign key references patient_table(patientid),
timestamp date
}

test_metadata {
testid int pkey,
test_type varchar(),
test_datatype int fkey references eav_types,
test_unit_of_measure varchar()
}

eav_types {
datatype int pkey,
eav_table_name varchar()
}

patient_table {
patientid int pkey,
first_name varchar,
last_name varchar,
dob date
....
}

eav_real {
eventid int fkey references test_events(eventid),
testid int fkey references test_metatdata(testid),
value real
}

ok. So now for the data/meta:

patient_table VALUES { patientid = 1, first_name =... }
patient_event { eventid = 1, patientid = 1, date = 1/5/1998 } patient_event { eventid = 2, patientid = 1, date = 1/5/1998 }

test_metadata { testid = 1, test_type = 'hemoglobin', test_datatype = 1, test_unit_of_measure = 'g/dL'} test_metadata { testid = 2, test_type = 'potassium', test_datatype = 1, test_unit_of_measure = 'meq/L'}

eav_types { datatype = 1, eav_table_name = 'eav_real'}

eav_real { eventid = 1, testid = 1, value = 12.5 } eav_real { eventid = 2, testid = 2, value = 4.9 }

Cowbert (talk) 01:54, 2 May 2008 (UTC)
Reply

Downsides edit

Seem like good points, but should be written in 3rd person, and should be sourced! —Dfass 05:48, 24 December 2006 (UTC)Reply

I agree. It is self-referential (referring to a previous version of the same article) in a way that is inappropriate. It also seems that many of the points are already discussed in other sections, and the information in this section could be integrated there. Nimrand 21:04, 1 June 2007 (UTC)Reply
Just wanted to pop in and agree with the disappointment /w the Downsides section. It feels abundantly unprofessional and not up to the calibre I would normally expect from coverage of this topic from Wikipedia. More specifically, its placement, wording, and lack of sourcing make it sound like regressive sniping. Write some published papers on this and then cite it, until then (IMO) it shouldn't even be here...


The drawback number 2 (scalability for large datasets) can be greatly improved by using 2 queries for retrieving the data.

  • 1st is for filtering and sorting the records, here you would self join ONLY attributes that are used in filters and order by. Ordered result of this query creates empty array with placeholders for entities.
  • 2nd is regular SELECT (or UNION when attributes are splited to few tables by type) that is filtered by entity id, which were received from 1st query. Result from this query is being fetched into array created by 1st query, which makes the result correctly sorted.


I've removed the downsides section entirely for now. The article needs a downsides section to be sure, but the one it had was written in a personal/opinionated and unprofessional manner. I think it's better to leave it out until someone can rewrite it without original research.--Fyedernoggersnodden (talk) 01:36, 22 March 2010 (UTC)Reply

Accuracy of this article edit

The article makes a lot of unsubstantiated claims (which may or may not be true), many of which seem to stem from the desire to constantly compare EAV to traditional/flat/SQL databases. We can leave them tagged as {{fact}} for a while, but there's a lot of it and I was hoping someone would provide sources or just remove it.

Perhaps we could split the article into an article about EAV, and a separate article comparing it to other models. In other words, we might consider separating the factual and useful information about EAV from the subjective and oft-disputed comparisons to other models.

rename edit

Rename to Entity-Attribute-Value model. Proper nouns in this context, always capitalised. Andy Dingley (talk) 18:32, 25 July 2008 (UTC)Reply

I agree. A simple Google search for entity attribute value will return items in which all three are capitalized. 76.120.35.164 (talk) 06:40, 2 September 2008 (UTC)Reply

Biased edit

This article seems awfully biased towards specific domain (medicine) and specific works/people.

This article should be split in two: one on EAV modeling generally and another on EAV in medicine. It should also reference the newer variable schema NoSQL solutions (e.g. MongoDB and Solr) that address this problem also. — Preceding unsigned comment added by Jroughgarden (talkcontribs) 21:50, 3 November 2011 (UTC)Reply

It's like an example has over-taken the whole article...

Tail wagging the dog syndrome. —Preceding unsigned comment added by 173.15.94.125 (talk) 15:30, 8 October 2009 (UTC)Reply

Because the first commercial application of it was in that domain and is the primary model used for data requirements within that domain. Feel free to talk about it in a non-medical domain with real examples if you have any (AFAIK, I have not seen EAV papers on or implementations for GIS or operational management domains "out in the field"; many domains now directly use ORM-based strategies and completely abstract out the schema entirely...) Cowbert (talk) 13:06, 15 October 2009 (UTC)Reply
The Magento e-commerce system would be a good source of non-medical examples. It uses a very EAV to store very complex and variable data. —Preceding unsigned comment added by 158.158.240.230 (talk) 14:06, 19 October 2009 (UTC)Reply
The article appears to examine one implementation of EAV systems, and point out its flaws. The Large Record and repeating group models used in many modern Model 204 implementations removes requirement for large numbers of joins and allows for a very efficent method to record large datasets of time related data, requiring one group instance (only to house the actually changing data) instead of a complete, separately related table entry for each new datum. This results in a net reduction in the number of joins required to retrieve data. —Preceding unsigned comment added by 115.187.253.93 (talk) 13:49, 7 March 2010 (UTC)Reply

The last bits are also biased too much towards Microsoft solutions and - which is even worse - relational database solutions. There are a multitude of software today that offer sparse schemas and would solve those problems in a whim, but of course it will take a few years until the industry fully embraces it. --200.139.90.98 (talk) 20:05, 25 June 2010 (UTC)Reply

There is a reason why industry is slow to embrace non-RDBMS solutions: RDBMS solutions work scalably, and through mechanisms such as indexing, can compensate for some of the inherent inefficiencies of EAV. As I've already stated, XML can be a viable alternative in circumstances where the data volume is modest. If you know of mainstream non-relational, non-XML solutions that are used in production systems, go ahead and cite and reference them in the body of the article. Prakash Nadkarni (talk) 17:20, 22 July 2010 (UTC)Reply

Oh be serious, there are lots of non-relational and semi-relational databases in production systems. Sop many I can't even be bothered to start listing them. I think the POV and SOAP comments are born out by that lack of awareness alone! —Preceding unsigned comment added by 2.121.26.205 (talk) 23:33, 22 January 2011 (UTC)Reply

POV / Metaphor edit

I taggged this as "POV" because the following two phrases just don't belong in a serious discussion of technical concepts, but express a particular author/editor's point of view, and clearly fail the "WP:SOAP" test:

Metadata is so important to the operation of a production EAV system that in its absence, one is essentially trying to drive a car without brakes or a steering wheel.

The correctness of the metadata contents, in terms of the intended system behavior, is critical enough that the contents constitute the system's "crown jewels".

TML (talk) 06:40, 26 March 2010 (UTC)Reply


The offending sentences have been removed, since the point they emphasized was already stated earlier in the text, and I agree there is no need for additional dramatics. However, to argue that the issue of criticality of metadata to the operation of an EAV system is a POV is like asserting that the moon is made of green cheese. The schemas of production EMRs (EpicCare, Cerner) contain a very large metadata sub-schema for the very purpose of ensuring consistency. Please also take the trouble to inspect the design of the Department of Veteran Affairs' freely available VistA before jumping to the POV conclusion.

Finally, I quote from the XML chapter of Ben-Gan et al "SQL Server 2008 T-SQL Programming" (Microsoft Press) : this book is cited in the article. The author of this chapter, Dejan Sarka, is NOT a particular fan of EAV, and points out that the *drawback* of an EAV design is the need to create a metadata infrastructure. To quote:

"The so-called open schema solution is quite popular. In this solution, you have the main Products table with common attributes. Then you add an Entity-Attribute-Value (EAV) table, which has three columns: product ID, attribute name, and attribute value. This is a completely dynamic solution, and is quite simple. However, with this solution you cannot easily use declarative constraints. With a single EAV table, the attribute value column is either character or variant data type. In addition, you have to prevent the insertion of meaningless attributes, or binding attributes to products for which the attributes are meaningless. You can create an EAV table for every data type of attributes you need to solve the data type problems. You can add an additional table that holds valid combinations of products and attributes. You can add additional constraints using triggers, middle tier code, or client code.

"Whatever you do, your solution is not that simple anymore. Your database design does not reflect the business problem anymore; you do not design a database from logical point of view, you design it from a physical point of view. Because of this last fact, I would hardly call this solution relational. Note that I am not saying that if you use such a solution, you have to modify it. In the past, this was probably the best option you had if you needed dynamic schema. However, nowadays I prefer the solution I am going to introduce at the end of this section."

(He goes on to introduce an alternative, based on XML, which is discussed subsequently in the Wikipedia article, in the section "Alternative XML")


Prakash Nadkarni (talk) 06:40, 9 April 2010 (UTC)Reply


I wasn't arguing the accuracy of the issue, merely the language used to convey it.TML (talk) 03:22, 2 May 2011 (UTC)Reply

Alright, I'm curious. edit

What disease is the example table of symptoms describing? :D 121.221.35.11 (talk) 11:39, 16 December 2010 (UTC)Reply

   The symptoms are fairly typical of bacterial pneumonia (=infection of the lung)

Prakash Nadkarni (talk) —Preceding undated comment added 15:59, 24 June 2011 (UTC).Reply

Is EAV the same as "vertical table"? edit

I've read about "vertical tables" for example at http://www.developer.com/db/article.php/3736011/Using-Vertical-and-Horizontal-Table-Structures-in-Oracle.htm which looks to me the same as EAV. If they are in fact the same thing, maybe that should be mentioned in the article... I tried searching wikipedia to learn more about "vertical tables" and didn't find anything, but some googling made me think that EAV might be related or the same. — Preceding unsigned comment added by Slinkp (talkcontribs) 18:20, 12 January 2011 (UTC)Reply

You are right that "vertical table" is a synonym for an EAV table: "key" in the article you have cited is a synonym for "attribute". I've amended the introduction to add this phrase. Prakash Nadkarni (talk)

Diagrams edit

This article discusses relationships between different tables and comparisons between differing approaches. It would benefit enormously with the inclusion of diagrams, even if hand-drawn. FreeFlow99 (talk) 15:52, 24 September 2013 (UTC)Reply

Copyright violation? edit

Large parts of this article appear to have been copied from Guidelines for the Effective Use of Entity-Attribute-Value Modeling for Biomedical Databases by Valentin Dinua and Prakash Nadkarni, whose publications (including this one) are heavily cited in this article. For example, the Entity–attribute–value model#EAV versus row modeling and Entity–attribute–value model#The Entity sections are nearly verbatim. This appears to be a copyright violation, and the material should probably be removed or heavily rewritten. RossPatterson (talk) 19:47, 25 July 2015 (UTC)Reply

On 17 July 2015 Prakash Nadkarni made no fewer than 17 edits, among which a few major additions of text, in 2.5 hours. So I fully concur with the above. In addition, the style of writing is that of some scientific text or reader ("we discuss how values are stored shortly", "we shall see later", "you would need to", "you have to") and not suitable for an encyclopedic lemma. Too little EAV substance, too much bio-medical example. I suggest to move all the bio-medical material to a separate lemma and focus here on EAV proper only. Pvanlaake (talk) 09:11, 17 June 2016 (UTC)Reply
As co-author of the material in question (which was written towards Valentin Dinu's PhD dissertation - I was his thesis co-advisor) I believe that reproduction of the material would fall under the fair-use policy. This is not self-plagiarism either: the original source is not concealed.
The reason for the biomedical examples is that biomedicine, which is my own field, forms a major set of use cases for EAV (which is one of the reasons it is so little emphasized in courses on, say, business database design.) I wouldn't object, though, to moving biomaterials to a separate lemma - as long as you are prepared to replace biomedical examples earlier in the text with non-biomedical examples. And if you think the style is too 'scientific', go ahead and clean it up. Prakash Nadkarni (talk) Jan 17, 2017. —Preceding undated comment added 22:33, 17 January 2017 (UTC)Reply

Dynamic Relational (A Similar Concept) edit

An informal but involved discussion on the proposed "Dynamic Relational" specification can be found at http://c2.com/cgi/wiki?DynamicRelational

The "short" description is that tables can be conceptually modeled as a bunch of XML statements such as:

<employee lastname="Smith" firstname="John" salary=55000/>
<employee lastname="Adams" firstname="Lisa" middlename="Lee" salary=78100/>
<employee lastname="Bates" firstname="Gill" salary=30000/>

The implementation doesn't have to actually be XML, just resemble it (plus an automatic unique row identifier, not shown). It could be implemented with an Entry/Attribute/Value table(s) "under the hood".

The above is a section from an example "employee" table. All attributes (columns) are optional unless an explicit requirement is issued to require such. Type validation can also optionally be added this way. This allows a table to incremental grow more rigid, closer to a regular RDBMS table. With some minor adjustments, SQL or a variation could the query language to make it familiar to users. One can "create" a column (attribute) simply by including it in an INSERT or UPDATE statement (unless declared forbidden for a given table). --146.233.255.212 (talk) 22:51, 22 August 2016 (UTC)Reply

External links modified edit

Hello fellow Wikipedians,

I have just modified one external link on Entity–attribute–value model. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 18 January 2022).

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 21:01, 24 December 2016 (UTC)Reply

Messy article edit

I have to say, this article is a bit of a mess, consisting of a lot of unconnected sections that are partly redundant. I tried to restructure the article by moving some sections around, but I think it'd help if it was cleaned up more thoroughly. Slhck (talk) 14:31, 25 April 2019 (UTC)Reply

I 2nd this opinion. This article is extremely long and dense WP:NOTEVERYTHING. It appears like many sections were pulled right out of textbooks and whitepapers, and often contain opinionated statements on the proposed designs. It needs major re-write Knoxinbox (talk) 17:10, 13 September 2019 (UTC)Reply

Though I agree, at least someone did it and it's somewhat comprehensible to me so let's at least acknowledge that. A few diagrams might help though. 92.2.118.221 (talk) 09:44, 21 October 2021 (UTC)Reply

suggest a minor change edit

> and the engine has components such as cylinders

i suggest re-phrasing this, because, obviously, not all engines necessarily 'have' cylinders. Previously Used Coffins (talk) 10:17, 17 July 2023 (UTC)Reply

Google App Engine is not a data store edit

The article refers to Google App Engine as though it were a database, but that is not correct. Google does offer a number of databases in their cloud platform. Any idea which DB the author intended to reference? Mrusso-wiki (talk) 18:47, 11 October 2023 (UTC)Reply

recurring changes unknown too device edit

Pinned pages, reoccurring changes too factory settings internal properties changes made too certification. Using work done on actual engine (truck)( Sea horse) all too use Target change device digital integrations. Mobile cell wireless needs too get restored back too factory, source coxbox too interrupt paid services wifi internet. Same interrupt dta internet exhausting my UNLIMITED (DATA )SERVICE. BY( WORK )CHANGED MANIFOLD TOO HEADERS MUFFLER (EXHAUST) TOO GLASS( PAC) REALLY REALLY — Preceding unsigned comment added by 2600:8807:CC0:EE30:31E9:9BA9:427B:5349 (talk) 13:53, 24 March 2024 (UTC)Reply