Database Systems 7. presentation State of the art Bence Molnár Distributed Databases Distributed systems — Systems are not attached to a distinct device but to several networked devices — Requirements — — — High speed network — Decreasing prices and increasing speed of CPUs Why they are applied? — Economical (vs. Supercomputer) — Huge computation capacity — Increased reliability — Join different services (SOA) Challenges, solutions Distributes databases — Data stored in multiple physical computers (co located or different physical location), however logically integrated and consistent — Pros: — — Reducing communication costs — Available even if a node fails (robustness) — Modular design, flexible configuration (scalability) — Easier maintenance Cons: — Complex system — Multiple hardware and software solutions — Complicated user (permission) management Databases in cloud Cloud — Large groups of remote servers are networked to allow centralized data storage and online access to computer services or resources — Service models: IaaS (Infrastructure as a service) — — — — — Windows Azure, Google AppEngine, Cloud Foundry SaaS (Software as a service) Google Apps, Facebook, Microsoft Office 365, OnLivelarge Dropbox, Google Drive? — — Amazon EC2, Windows Azure VM, Google Compute Engine PaaS (Platform as a service) Cloud Cloud Databases in cloud — DB on virtual server (VPS, Virtual Private Server) — Oracle DB, CouchDB,... — DB — PostgreSQL, MySQL, as a service Amazon Dynamo, Google Store, Microsoft SQL Azure App Engine Accessing databases Standard drivers — Goal: managing databases in OS and DB independent way — drivers — File based data is accessible as well (e.g. CSV, XLS, etc...) — ODBC (Open Database Connectivity): MS supported — JDBC — FDO (Java Database Connectivity) (Feautre Data Objects) Standard drivers C/C++ Matlab, PHP, Ruby, ... Java, .NET, ... Driver (ODBC, JDBC, FDO, ...) PostgreSQL, MySQL, ... Microsoft Jet (Access) CSV, other files... Spatial databases (FDO) Accessing DB in Matlab — Database — Support — Tables Toolbox for ODBC & JDBC ↔ Matrices (equivalent) — Database Explorer App Access DB in Matlab (JDBC) % 1. JDBC download driver, eg.: PostgreSQL: http://jdbc.postgresql.org/download.html % 2. Add JAR file to classpth.txt % 3. Set connection timeout (optional) Logintimeout(5); % 4. Set returned data type setdbprefs('DataReturnFormat','cellarray'); % 5. Connecting to database connA=database('database', 'username', 'password',... 'org.postgresql.Driver', 'jdbc:postgresql://localhost/'); % 6. Validate connection (optional) ping(connA); Accessing DB in Matlab (JDBC) % 7. Run query selCols = ['packetid, b0, b1, b2, b3, b4, b5, b6']; cursorA=exec(connA, [' select ' selCols ' from exp1']); % 8. Fetch results into objects (cell) % cursorA=fetch(cursorA, 10); cursorA=fetch(cursorA); % 9. Accessing data DataMat = cursorA.Data; % 10. close cursor and connection (release resources) close(cursorA); close(connA); Semi-structured databases Properties — Data and schema are not separated — Pros: — Schema doesn't locks the information — Flexible format: easy to modify the schema — Portable data transfer — Queries — E.g.: are less efficient compared to SQL OEM (Object Exchange Model), XML (Extnesible Markup Language) XML — Standard — Format: — Tag: <something></something> — Self closing tag: <something/> — Tags might be nested but not overlapped e.g.: <something1> <something2> </something2> </something1> — Single root element — XML declaration, processing commands and comments — XML Schema: XSD — XHTML XML Example: <?xml version="1.0" encoding="UTF-8"?> <Recipes name="bread" preparing_time="5 min" cooking_time="3 hours"> <title>Simple bread</title> <ingredient quantities="3" unit="cup">Flour</ingredient> <ingredient quantities="10" unit="decagramme">Yeast</ingredient> <ingredient quantities="1.5" unit="cup">Warm water</ingredient> <ingredient quantities="1" unit="teaspoon">Salt</ingredient> <Commands> <step>Mix all ingredients together, then knead well!</step> <step>Cover with a cloth and let rest for an hour in a warm room! </step> <step>Knead again, put it in a tin pan, then bake it in the oven! </step> </Commands> </Recipes> XQuery — Query languages — XPATH — FLWOR expressions: — FOR $var IN exp_sequence_nodes — LET $var_single_value := exp_values — WHERE exp_condition — ORDER BY exp_order — RETURN exp_result XQUERY példa for $product in doc("catalog.xml")/catalog/product let $name := $product/name where $product/@dept = "ACC" order by $name return $name Document oriented databases — Storing document — Standard format XML, JSON, etc... — binary: PDF, MS Office, etc... — Every document has a unique identifier (e.g.: URI) — References — http://en.wikipedia.org/wiki/Distributed_computing — http://en.wikipedia.org/wiki/Distributed_database — http://en.wikipedia.org/wiki/Cloud_computing — http://en.wikipedia.org/wiki/Virtual_private_server — http://en.wikipedia.org/wiki/Semi-structured_data — http://en.wikipedia.org/wiki/XML — http://en.wikipedia.org/wiki/XQuery — http://en.wikipedia.org/wiki/FLWOR Thank You!